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COMPLEX  AUDITORY  SIGNALS 
Final  Report:  Complex  Auditory  Signals  AFOSR-85-0374. 

The  following  represents  a  summary  of  research  of  the  research  for  the  period,  September  15,1 965  to 
September  1, 1988.  We  summarize  our  past  research  in  terms  of  three  major  themes:  Ijsynchrony  detection, 
2)  perception  of  nonstationary  spectra,  and  3)  basic  properties  of  profile  analysis. 

1.  Synchrony  Detection 

One  major  research  theme  of  our  past  research  has  been  the  topic  of  synchrony  detection  pursued 
largely  by  Dr.  V.  M.  Richards.  Briefly,  she  claims  that  the  perception  of  many  complex  acoustic  stimuli 
depends  on  simultaneous  comparisons  of  dynamic  changes  occurring  in  different  spectral  regions. 
Specifically,  Dr.  Richards  believes  that  envelope  comparisons  (Ref.  11,12)  can  be  made  in  different  spectral 
bands  and  that  the  correlation  between  these  envelopes  is  a  major  cue  to  the  coherence  and  grouping  found 
in  many  complex  acoustic  stimuli. 

The  idea  of  cross-spectral  comparison  has  been  around  for  some  time  in  the  acoustics  community. 
Interest  in  this  idea  was  considerably  stimulated  by  the  experiment  of  Hall,  Haggard,  and  Fernandes  1 984. 
Before  that  time,  cross-spectral  comparison  was  generally  considered  to  be  of  no  importance.  The  basis  for 
that  opinion  was  an  unpublished  technical  report  by  Schubert  and  Nixon  (1970).  In  their  experiments, 
subjects  were  asked  to  distinguish  between  correlated  and  uncorrelated  noise  bands.  They  found  that  such 
discrimination  was  impossible.  Richards  (Ref.  10-see  list  presented  below)  found  that  such  discrimination 
was  possible  and  traced  the  earlier  failure  to  a  poor  choice  of  frequency  location  and  duration  of  the  noise 
bands. 

This  effort  is  probably  one  of  the  most  interesting  current  developments  in  psychoacoustics. 

Whereas,  previously  we  had  thought  such  comparisons  were  impossible,  we  now  know  that  they  are  possible 
and  can  be  made  with  some  precision.  In  effect,  they  suggest  that  a  new  kind  of  auditory  process  must  be 
carefully  considered  in  explaining  the  perception  of  any  complex  auditory  signal. 

Dr.  Richards  is  Finishing  the  last  year  of  her  NIH  post-doctoral  fellowship  and  has  made  application 
for  a  FIRST  award  from  NIH  to  further  support  this  research.  We  presume,  for  the  purposes  of  this 
proposal,  that  such  support  will  be  forthcoming.  Thus,  we  will  not  request  future  funding  in  this  proposal  to 
support  research  on  this  very  important  topic.  Although  she  is  welcome  to  stay  at  Florida  and  pursue  her 
research,  she  is  naturally  looking  for  a  full  faculty  position  and,  undoubtedly,  will  eventually  secure  one. 

2.  Perception  of  Nonstationary  Spectra 

A  second  general  theme  of  our  past  research  has  been  the  exploration  of  the  perception  of 
nonstationary  spectra.  The  specific  research  involves  complex  auditory  spectra  containing  components  that 
are  amplitude  modulated.  The  resulting  paper  (Ref.  8)  should  appear  shortly.  Although  amplitude 
modulation  is  known  to  greatly  increase  the  saliency  of  individual  components  of  a  complex  spectra,  such 
modulation  does  little  to  increase  the  detectability  of  amplitude  changes  in  these  components.  Except  for  the 
highest  frequency  components  (f  >  2000  Hz),  amplitude  modulation  tends  to  make  changes  in  level  of 
components  less  detectable.  One  experimental  condition  allowed  us  to  estimate  the  upper  rate  for  which  the 
relative  phase  of  the  modulation  was  important.  That  rate  appears  to  be  about  40  Hz  and  to  be  the  same  for 
frequency  regions  as  diverse  as  250, 1 000,  and  4000  Hz.  For  modulation  rates  above  the  value,  the  phase  of 
modulation  between  the  various  components  of  the  complex  can  be  ignored;  only  the  power  spectra  of  the 
stimuli  are  important. 

A  second  aspect  of  nonstationary  spectra  is  the  detection  of  amplitude  variation  occurring  over  the 
entire  spectrum,  such  as  the  amplitude  modulation  of  noise,  or  a  silent  gap  inserted  in  the  ongoing  noise.  In  a 
recent  paper  (Ref.  7),  we  compared  human  performance  in  such  tasks  with  a  modification  of  a  model  first 
proposed  by  Viemeister  (1979).  Viemeister's  model,  typical  of  a  wide  class  of  model,  accounts  for  such 
detection  by  using  a  decision  rule  that  computes  the  variance  in  intensity  fluctuation  at  the  output  of  his 
detection  process.  Our  modification  was  to  compute  the  maximum-to-minimum  ratio  observed  in  the  same 
output.  It  is,  however,  possible  to  argue  that  suJ.  defection  depends  on  comparison  of  level  fluctuation 
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across  different  frequency  channels,  that  is,  synchrony  detection.  The  reason  is  that  amplitude  modulation  or 
the  presence  of  a  gap  occurs  at  the  same  time  for  all  frequency  locations.  Thus,  one  might  also  explore  the 
extent  to  which  gap  detection  can  be  explained  on  the  basis  of  detecting  simultaneous  patterns  of  output 
observed  over  several  spectral  channels  (synchrony  detection).  Gap  detection  and  amplitude  modulation 
detection  are  major  components  of  one  of  our  future  research  initiatives  and  will  be  discussed  in  greater 
detail  later  in  this  report. 

3.  Basic  Properties  of  Profile  Analysis 

The  third  and  final  area  of  effort  has  concerned  the  basic  properties  of  profile  analysis.  Since  this  will 
be  a  major  theme  of  our  proposed  research,  we  will  describe  it  only  briefly  here.  We  know,  from  a  number  of 
previous  studies,  that  the  smallest  detectable  increment  in  the  intensity  of  a  single  component  of  a  multi- 
component  complex  occurs  when  the  frequency  of  the  incremented  component  lies  in  the  middle  of  the 
spectrum  (Ref.  9).  The  detection  of  complex  amplitude  changes  throughout  the  spectrum  is  currently  not 
understood  in  terms  of  simple  integration  of  the  detectability  of  the  change  in  single  components  (Ref.  5). 
However,  it  is  possible  to  suggest  a  simple  computational  scheme  to  account  for  the  detectability  of  most 
complex  changes  (Ref.  3).  Unfortunately,  this  computational  scheme  can  be  shown  to  systematically  fail  to 
account  for  one  class  of  stimulus  change  that  involves  changes  in  the  spectral  density  of  the  components.  The 
systematic  exploration  o*  this  variable,  the  number  of  components  used  to  represent  the  complex  spectrum, 
indicates  that  the  apparent  analysis  band  of  the  listener  (profile  critical  band)  is  about  the  same  size  as  the 
conventional  critical  band  (Ref.  2). 

In  addition  to  these  substantive  papers,  we  have  also  contributed  summaries  of  this  research  area. 
Such  efforts  help  to  organize  our  own  thinking  about  the  area,  and  provide  succinct  summaries  of  this 
research  for  active  researchers  in  this  and  neighboring  fields.  Ref.  1,4,  and  6  are  illustrations  of  such  efforts. 
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8)  Green,  D.  M.  and  Nguyen,  Q.  T.  (1988)  "Profile  analysis:  Detecting  dynamic  spectral  changes,  to  appear  in 

Hearing  Research. 

9)  Green,  D.  M.,  Onsan,  Z.  A.,  and  Forrest,  T.  G.  (1987)  "Frequency  effects  in  profile  analysis  and  detecting 

complex  spectral  changes."  J.  Acoust.  Soc.  Am.  81 , 692-699. 

10)  Richards,  V.  M.  "Monaural  envelope  correlation  perception"  (1987)  J.  Acoust.  Soc.  Am.  82,  1621-1630- 

11)  Richards,  V.  M.  “Component  of  monaural  envelope  correlation  perception"  (1987)  submitted  for 

publication  in  Hearing  Researcn  1987. 
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5.  Personnel 


The  major  technical  people  are  listed  below,  along  with  comments  on  their  present  status. 

Dr.  Les  Bernstein  January,  1 986  -  January,  1 988.  Dr.  Bernstein  is  now  at  the  University  of 
Connecticut,  Medical  Center.  We  have  nearly  completed  negotiations  with  Dr.  Bruce  Berg,  Ph.D.  University 
of  Indiana,  1987,  to  replace  Dr.  Bernstein.  He  is  presently  a  research  fellow  in  the  radiology  section  of  the 
Harvard  Medical  School  and  will  join  the  laboratory  in  July,  1 988. 

Dr.  Virginia  M.  Richards  -NIH  postdoctoral  fellow,  June  1985-present.  She  has  been  invited  to 
continue  her  research  at  the  laboratory,  but  is  actively  seeking  a  'real*  job. 

Dr.  Timothy  Forrest-assistant  in  psychoacoustics,  October,  1985-present.  Dr.  Forrest  is  an 
entomologist  by  training  and  is  actively  seeking  an  academic  position  in  that  area. 

Mr.  Quang  Nguyen  (B.S.  Electrical  Engineering,  University  of  Florida,  1986). 

Ms.  Zekiye  Onsan  (B.S.  Astronomy,  University  of  Istanbul,  Turkey,  1977). 

Mr.  Richard  Newton  (B.A.  Computer  Science,  University  of  Florida,  1988)  Now  working  for 
Intel  Corporation,  Santa  Clara,  California. 

Mr.  Timothy  Tucker  (B.S.  expected  May,  1988,  Electrical  Engineering,  University  of  Florida). 

Ms.  Jill  Johnson  Raney-graduate  student,  2nd  year. 

Ms.  Cheryl  Williams-secretary  and  laboratory  coordinator. 
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j —  THE  DETECTION  OF  SPECTRAL  SHAPE  CHANGE 

Leslie  R.  Bernstein,  Virginia  Richards, 

;  and  David  M.  Green 

i  Psychology  Department i  University  of  Florida, 

j  Gainesville,  Florida 

!  32611 |U. S • A. 

j  Introduction  j  ; 

I  I  I 

We  describe  several  experiments  involving  the 
!  detection  of  a  change  in  the  spectral  shape  of  a  j 

I  complex  auditory  signal,  what  we  call  profile-analysis. 
All  of  the  experiments  are  discrimination  tasks 
involving  a  broadband  "standard"  spectrum  and  some 
alteration  of  that  spectrum  produced  by  adding  a 
"signal"  to  the  standard.  For  all  of  the  experiments 
described  here,  we  used  a  standard  composed  of  a  set  of 
equal-amplitude  sinusoidal  components .  The  spectrum 
of  the  standard  was,  therefore,  essentially  flat.  In 
different  experiments,  various  waveforms  were  added  to 
this  standard  to  create  changes  in  its  spectral  shape, 
and  the  ability  to  detect  such  changes  was  measured. 

In  the  first  experiments,  we  describe  how  the  relative 
phase  among  the  components  of  the  standard  waveform 
influences  the  detection  of  a  signal.  The  results  are 
very  simple.  Phase  seems  to  play  no  important  role. 

The  detection  of  a  change  in  spectral  shape  appears  to 
i  depend  only  on  changes  in  the  power  spectrum  of  the 
signal  and  is  independent  of  the  temporal  waveform. 
Next,  we  describe  how  the  detection  of  an  increment  in 
'  a  single  component  depends] on  the  frequency  of  that 
i  component.  These  results  provide  the  basic  data  to 

evaluate  complex  changes  in  the  whole  spectrum,  such  as 
|  a  sinusoidal  ripple  in  the  I  amplitudes  of  the  components 
i  over  the  entire  spectrum.  I  Our  data  indicate  that 

there  is  a  sizable  discrepancy  between  the  ability  to 
|  detect  changes  occurring  oyer  the  entire  spectrum  and 
the  ability  to  detect  changes  in  single  components. 

j  j  | 

Procedure  , 

I  | 

We  used  a  two-alternative,  forced-choice  procedure, 

j  to  evaluate  the  detectability  of  the  change  in  spectral 
j  shape.  In  one  interval,  the  listener  heard  the 
!  "stardard"  sound;  in  the  other  interval,  the  listener 
heard  the  "standard  plus  signal".  The  signal  component 
was  always  added  at  a  fixed  phase  relation  to  the 
standard  component,  generally  in-phase.  An  adaptive 
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-  two-down,  one-up  rule Iwas  used  to  estimate  70.7  % 

correct  detection.  The  thresholds  reported  are  the 
;  signal  amplitude  re  the  component  of  the  standard  to 
|  which  the  signal  is  added.  J  A  threshold  of  0  dB  means 
j  that  the  signal  and  standard  components  are  equal  in 
j  amplitude.  Typically,  the  average  threshold  was  based 
i  on  at  least  12  runs  of  50  trials.  Each  sound  was 
generated  digitially  and  presented  for  about  100  msec. 

i 

The  standard  spectrum iwas  composed  of  a  sum  of 
sinusoidal  components.  Exdept  for  one  experiment  where 
the  number  of  components  is  varied,  there  were  21 
i  components  extending  in  frequency  from  200  to  5000  Hz. 
j  The  ratio  of  the  frequencies  between  sucessive 
j  components  was  constant;  that  is,  the  frequencies  were 
spaced  equally  on  a  logarithmic  scale.  Because 
distance  along  the  basilar  membrane  is  proportional  to 
the  logarithm  of  frequency^  our  components  provided  a 
roughly  uniform  stimulus  over  the  linear  receptor 
surface  of  the  cochlea.  j 

1  ! 

One  final  experimental  feature  must  be  clearly 
understood.  Because  we  are j interested  in  the  detection 
of  a  change  in  spectral  shape,  we  must  ensure  that  the 
observer  is  not  simply  discriminating  a  change  in 
intensity  at  a  single  frequency  region.  To  do  this, 
we  randomly  varied  the  overall  level  of  the  sound  on 
each  and  every  presentation.  The  level  of  the  sound 
was  chosen  from  a  rectangular  distribution  of  intensity 
covering  a  range  of  20  or  40  dB  in  1  dB  steps.  The 
median  level  was  about  50  to  60  dB  SPL.  Thus,  while  j 

the  "flat"  standard  might  he  presented  at  71  dB,  the 
altered  spectrum,  the  "signal  plus  standard",  might  be 
presented  at  34  dB  on  a  given  trial  of  the  forced- 
choice  procedure.  The  observer's  task  was  to  detect 
the  sound  with  the  altered 'spectral  shape  despite  the 

difference  in  overall  level. 

i 

I  ( 

Effects  of  phase  I  ; 

In  most  of  the  experiments  concerning  profile  ' 
analysis,  the  phase  of  each  component  of  the  multitonal’ 
complex  has  been  chosen  at  jrandom  and  the  same  waveform 
(except  for  random  variation  of  level)  is  presented 
during  each  "non-signal"  iriterval.  Therefore,  the 
logical  possibility  exists  [that  observers  might 
|  recognize  some  aspect  or  aspects  of  the  temporal 
i  waveform.  If  this  were  true,  then  discrimination  could 
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j —  be  based  on  some  alteration  of  the  temporal  j 

waveform  during  the  "signal"  interval  rather  than  by  a 
!  change  in  the  spectral  shape  of  the  stimulus  per  se. 

!  I  ! 

i  Green  and  Mason  (1985)  investigated  this 

i  possibility  directly  with  the  following  experimental  ! 
j  manipulations t  Multicomponent  complexes  were  generated 
.  which  consisted  of  5,  11,  21,  or  43  components  spaced 
j  logarithmically.  In  all  cases,  the  frequency  of  the 
|  lowest  component  was  200  Hz#  the  highest  was  5  kHz. 
i  The  overall  level  of  the  complex  was  varied  randomly 
over  a  40  dB  range  across  presentations  with  a  median 
1  level  of  45  dB  SPL  per  component.  The  signal  consisted 
of  an  increment  to  the  1-kHz,  central  component  of  the 
j  complex.  j 

In  what  Green  and  Masdn  termed  the  "fixed-phase"  I 
condition,  four  different  complexes  were  generated  for  ' 
each  number  of  components  (5,  11,  21,  and  43)  by  j 

.  randomly  selecting  the  phases  of  each  component.  Note 
I  that  for  these  fixed-phase  'conditions ,  the  same 
|  waveform  (except  for  random  variation  of  overall  level) 

I  occurred  during  each  non-signal  interval. 

!  !  ! 

i  In  what  Green  and  Mason  called  the  "random-phase"  ■ 

I  conditions,  88  different  phase-randomizations  of  the 
multicomponent  complex  were  generated.  On  each  interval 
|  of  each  trial,  one  of  the  88  waveforms  was  selected  at 


random  (with  replacement)  for  presentation.  Thus,  the 
temporal  waveforms  generally  differed  on  each 
presentation.  The  amplitude  spectra,  however,  were 
identical . 


Figure  X.  Signal  threshold 
(dB)  m  a  function  of  the 
number  of  component •  In  the 
complex.  Open  clrcleai  data 
obtained  for  each  of  the  four 
phase-randomisations  when  the 
phaee  of  each  coa^onent  wee 
fixed  throughout  a  block  of 
trlala  ("fixed-phase" 
condition).  Filled  triangles i 
data  from  the  "random-pheee" 
condition  in  which  the  phases 
of. the  components  were  chosen 
at  random  on  each 
presentation. 


The  results  are  presented  in  Figure  1.  For  each 
value  of  component  number,  Ithe  open  circles  represent 
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j  the  thresholds  obtained  for  each  of  the  four 

!  randomizations  in  the  fixed-phase  condition.  The 
1  triangles  represent  the  data  obtained  in  the  random- 
■  phase  conditions.  The  results  indicate  that  changing  ' 

!  the  phase  of  the  individual,  components  and  thus  the 
characteristics  of  the  temporal  waveform  has  little,  if 
i  any,  effect  on  discrimination  even  if  the  waveform  is 
;  chosen  at  random  on  each  and  every  presentation.  These 
data  are  consistent  with  those  obtained  by  Green, 

Mason,  and  Kidd  (1984)  who  generated  waveforms 
:  utilizing  a  procedure  similar  to  the  fixed-phase 
condition  described  above.  I  | 

I  ; 

The  inability  of  changes  in  the  phase  of  the  I 

individual  components,  and  ;thus  changes  in  the  | 

characteristics  of  the  temporal  waveform,  to  affect  I 
discrimination  supports  the|  view  that,  in  these  tasks, 
observers  are,  indeed,  basing  their  judgements  on 
changes  in  spectral  shape,  j 

! 

The  form  of  the  functibn  relating  threshold  to  the 
j  number  of  components  in  the!  complex  is  one  that  has 
|  been  replicated  many  times  in  our  laboratory.  In 
j  general,  as  the  number  of  components  and  thus  the  , 

j  density  of  the  profile  is  increased  from  3  to  11  or  21 

performance  improves.  An  intuitive  explanation  for 
this  result  is  that  as  the  number  of  components  which  i 

compose  the  profile  is  increased,  additional  i 

independent  bands  or  channels  contribute  to  an  estimate  j 
of  the  "level"  of  the  profile. 

! 

Further  increases  in  jthe  density  of  the  profile 
lead  to  decrements  in  performance  and  this  trend  is, 
for  the  most  part,  explained  by  simple  masking.  When 
the  components  are  spaced  so  closely  such  that  several 
components  fall  within  the  "critical  band"  of  the 
signal,  the  addition  of  thej  signal  produces  a  smaller  | 
relative  increase  in  intensity  and  thus  becomes  more 
difficult  to  detect.  In  future  publications  we  will 
present  a  more  detailed  analysis  of  these  effects. 

i 

Frequency  Effects  | 

I 

The  results  discussed  hbove  suggest  that  detection 
of  an  increment  to  a  singlet  component  of  a  multi- 
!  component  complex  is  based  bn  changes  in  spectral 
i  shape.  The  phase  relation  among  the  components  appears  j 
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“to  have  little,  if  any,  effect  on  performance.  j 

j  | 

In  exploring  the  nature  of  this  process,  one  i 

I  fundamental  question  is  whether  the  frequency  of  the 
!  component  which  is  incremented  (the  frequency  region 
!  where  the  change  in  the  power  spectrum  occurs)  greatly 
;  influences  the  ability  to  detect  a  change  in  spectral  , 
j  shape .  i  j 

This  question  also  bears  on  that  of  how  the  I  ^ 

I  auditory  system  codes  intensity.  There  are,  at  least,  , 

I  two  different  mechanisms  that  have  been  proposed  as  the! 

basis  for  detecting  changes*  in  the  intensity  of 
i  sinusoidal  components.  One  is  what  we  will  call  the 
;  "rate"  model.  It  assumes  that  changes  in  acoustic 
'  intensity  are  coded  as  changes  in  the  rate  at  which 
I  fibers  of  the  eighth  nerve  fire.  One  limitation  of 
I  this  model  is  the  fact  thati  the  firing  rates  of  ! 

I  practically  all  auditory  fibers  saturate  as  the  i 

I  intensity  of  the  stimulus  is  increased  (Kiang  1965;  1 

I  Sachs  and  Abbas,  1974;  Evans  and  Palmer,  1980).  The  ) 

j  dynamic  range  of  firing  rate  for  many  fibers  is  only  ; 

!  about  20  to  30  dB.  nn  the  other  hand,  it  is  possible 
j  that  there  is  some  residual]  information  in  small 
I  changes  of  rate  even  at  the]  highest  stimulus  levels 
j  where  the  amount  of  change  produced  by  increasing  the 

j  intensity  of  the  stimulus  is  small.  There  is  also  the 

j  question  of  how  one  should  regard  saturation  when  one  I 
considers  the  entire  population  of  fibers  which  wav  | 
respond  to  a  given  stimulus;  in  that  different  ] 

populations  of  fibers  may  saturate  at  different  j 

intensities.  I  j 

I  | 

A  second  view  of  intensity  coding  stresses  the  j 
temporal  characteristics  of;  neural  discharges.  Sachs 
j  and  Young  (1979)  and  Young  and  Sachs (1979)  have 
i  demonstrated  that  "neural  spectograms"  based  on  neural 
synchrony  measures  preserve  the  shape  of  speech  spectra ; 
better  than  those  based  on  firing  rate.  We  were,  i 

therefore,  particularly  interested  in  how  well 
observers  could  detect  a  change  in  spectral  shape  at  i 
very  high  frequencies.  At  the  highest  frequencies, 
above  2000  Hz,  neural  synchrony  deteriorates  and,  if 
!  that  code  were  used  to  signal  changes  in  spectral  | 

i  shape,  then  the  ability  to  detect  such  alterations  in  ! 
the  acoustic  spectrum  should  also  deteriorate. 

In  one  previous  study,]  Green  and  Mason  (1985),  we  \ 

•  t 
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“made  some  measurements  of  hbw  the  locus  in  frequency  i 
affects  the  ability  to  detect  a  change  in  a  complex 
1  spectrum.  Our  results  suggested  that  the  mid-frequency 
|  region,  500  to  2000  Hz,  yielded  the  best  performanace 
i  but  variability  among  the  different  observers  was 
|  sizable.  Also,  those  data  may  have  been  contaminated  ' 
j  by  the  listeners  having  received  substantial  prior 
practice  with  signals  which  were  in  the  middle  of  the 
range.  |  I 


i 

i 

\ 


I 


The  results  of  our  most  extensive  experiment 
(Green,  Onsan,  and  Forrest,!  1986)  on  this  issue  are 
shown  in  Figure  2.  The  standard  spectrum  is  a  complex 
of  21-components,  all  equal1  in  amplitude  and  equally 
spaced  in  logarithmic  frequency.  The  overall  level  of 
the  standard  was  varied  over  a  20-dB  range  with  a 
median  level  of  40  dB  SPL  per  component.  The  signal, 
whose  frequency  is  plotted  along  the  abscissa  of  the 
figure,  was  an  increment  inj  the  intensity  of  a  single 
component.  The  ordinate,  like  that  of  Fig.  1,  is  the 
signal  level  re  the  component  level  to  which  it  was 
added.  The  results  show  that  best  detection  occurs  in  a 
frequency  range  of  300  to  3000  Hz,  with  only  a  mild 
deterioration  occurring  at  the  higher  and  lower 
frequencies.  If  detection  of  an  increment  in  this  task 
were  mediated  by  changes  in1  neural  synchrony,  one  would 
expect  to  observe  considerably  poorer  performance  at 
the  highest  frequencies  as  pompared  to  the  middle  and 
low  frequencies.  This  did  pot  occur. 


i 


Figure  2.  Signal  threshold 
(dB)  as  a  function  of  tha 
frequency  of  tha  signal. 

Twenty -one- component  complexes 
vara  esiployed.  Tha  algnal  van 
added  in-phase  to  tha 
corresponding  coupon ent  in  tha 
coaples. 
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One  other  result  from  Ithis  recent  study  also 
deserves  mention.  The  experiment  described  immediately 
above  was  repeated  with  one!  important  exception.  The 
median  level  of  the  standard  was  60  rather  than  40  dB 
SPL.  This  higher  intensityj  level  would  be  expected  to 

- \ _ 

- 1 _ 
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■f  produce  firing  rates  at  or  close  to  saturation  in 

j  nearly  all  fibers.  Despite'  this  fact,  the  thresholds 
|  obtained  were,  in  almost  all  cases,  lower  than  those 
i  obtained  at  the  lower  intensity  level. 

1  i  i 

i  .  i 

j  In  conclusion,  these  two  results  do  not  afford  a 

i  determination  of  the  underlying  neural  code  which 
mediates  the  detection  of  a  change  of  spectral  shape  in; 
j  our  experiments.  I 

!  1  I 

i  Complex  Spectral  Changes  ! 

!  I 

j  The  experiments  described  above  involve  changes  in 

,  the  intensity  of  a  single  component  of  the  multi- 
i  component  profile  (a  "bump";  in  the  spectrum) .  We  now 
turn  our  attention  to  more  complicated  manipulations, 

;  experiments  in  which  the  intensities  of  several 
;  components  of  the  spectrum  were  altered  simultaneously. ; 

|  A  primary  qoal  of  these  experiments  was  to  determine 
I  whether  listeners'  ability  to  detect  these  complex  I 

changes  could  be  predicted  on  the  basis  of  their 

sensitivity  to  changes  in  the  intensity  of  a  single  i 
component  in  the  profile.  I  ! 


Figur*  3.  Thrss  different 
frequencies,  k,  using 
sinusoids!  variation.  The 
signal  amplitude  at  each 
component  frequency  is  given 
by  Cq.  1  and  ie  added  to  the 
standard  with  a  relative 
amplitude  about  1/5  the 
standard  amplituda. 

i 


Once  again,  a  flat,  "standard"  composed  of 
logarithmically  spaced  components  ranging  from  200  to 
!  5000  Hz  was  used.  The  signal,  however  had  an 
|  amplitude-spectrum  that  varied  sinusoidally.  The 
I  amplitude  of  the  ith  component,  a[i],  was  given  by 
j  a[i]  =  sin(  2  *  pi  *  k;  *  i/M  )  i=l,M  Eq.  1 

I  where  k  represents  the  “frequency”  of  the  variation  and 
|  M  is  the  number  of  components  presented.  We  refer  to  I 
i  this  variation  in  amplitude^  as  a  "sinusoidally  rippled" 

!  spectrum,  and  to  k  as  the  "ripple  frequency".  Figure  3 
j  illustrates  the  result  of  in-phase  addition  of  the 
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j  "standard"  and  the  "signal"j  of  for  case  M=21.  The 
!  three  values  of  k  are  as  indicated.  Cosinusoidally  1 
j  rippled  amplitude  spectra  have  also  been  examined, 
j  Such  signals  are  generated  as  described  above,  except 
!  that  the  sine  term  of  Eq.  1,  is  replaced  by  cosine. 

I  i  .  I 

Two  points  deserve  note.  The  first  is  that  k,  the, 
frequency  of  the  ripple,  is  restricted  by  the  number  of 
;  components.  This  value  must  be  smaller  than  one  half 
;  the  number  of  components  (k  <  M/2).  Second,  changing 
j  the  value  of  k  does  not  alter  the  signal's  root-mean- 
square  (RMS)  amplitude.  All  values  of  k  produce  the 

same  a[i]'s,  only  their  order  is  changed.  ; 

i  ! 

Thresholds  were  measured  as  the  RMS  amplitude  of 
the  signal  re  the  RMS  amplitude  of  the  standard. 

Values  of  k  ranged  from  1  to  10.  Thresholds  were 
virtually  constant  for  all  values  of  k  (ripple  ■ 

frequency)  and  type  of  varialtion  (sine  or  cosine), 
with  an  average  of  -24.5  dB  across  all  conditions  , 

(Green,  Onsan  and  Forrest,  jl986). 

These  data  define  a  modulation  transfer  function 
j  (MTF)  .  Interestingly,  this;  function  is  flat  rather 
!  than  exhibiting  the  low-pass  characteristic  that  is 
typically  observed  in  sensory  psychophysics.  Because  k 
:  itiay  not  exceed  10  for  this  [21-component  complex,  we 
j  were  unable  to  investigate  [higher  ripple  frequencies 
j  and  thus  to  assess  more  completely  the  form  of  the  MTF. 
j  Undoubtedly,  thresholds  would  increase  if  the  ripple 
frequency  were  sufficiently,  large.  We  are  currently 
examining  the  effect  of  greater  ripple  frequencies  by  j 
using  profiles  composed  of  'a  greater  number  of 
components.  These  data  wiljl  allow  us  to  describe  more 
fully  the  MTF  i.e.,  the  relation  between  the  frequency  j 
of  the  ripple  and  detectability. 

Finally,  let  us  compare  the  rippled  specrtum 
j  thresholds  with  predictions;  based  on  the  ability  to  1 
;  discriminate  a  bump  in  the  spectrum;  data  obtained 
:  using  increments  to  a  single  component  of  the  profile, 
j  Because  the  ability  to  detect  an  increment  in  a  single 
i  component  of  a  21  component1  spectrum  is,  to  a  first 
!  approximation,  independent  of  the  frequency  of  the 
|  signal  (Fig.  2),  one  may  predict  the  threshold  for 
|  these  21  component  rippled  spectra.  If  we  assume  that 
!  the  information  concerning  changes  in  the  intensity  of 
I  each  of  the  signal's  21  channels  is  processed 
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-independently  and  that  d'  is  proportional  to  pressure, 

;  then  the  optimal  combination  is  the  one  in  which  the  ! 
i  squared  d'  for  the  complex  stimulus  is  equal  to  the  sum1 
i  of  the  squared  d's  associated  with  the  each  of  the 
channels  (Green  and  Swets,  1966).  This  leads  to  the 

expectation  that  the  detectability  will  ba  improved  by 

!  the  square  root  of  21.  ,  . 

|  The  proatta  ia  aa  fallows.  TTH m  of  ft  ! 

bump  in  a  flat  profile  leads  to  thresholds  of  about  -16' 

j  da.  TVtie  trenalatea  to  a  praaauira  of  0.16  relative  to  { 
the  standard.  Thus,  we  would  expect  that  the  average  i 
j  pressure  per  component  for  a  21  component  signal  to  be  , 

|  0. 16/J  21  or  0.035  (relative  to  the  standard)  which  is  j 
!  equivalent  to  an  RMS  amplitude  of  -29  dB.  This  value  | 

!  is  4.5  dB  smaller  than  the  mean  of  -24.5  dB  observed. 

I  Thus,  performance  on  the  complex  spectral  shape 
!  discrimination  task  is  poorer  than  expected  based  on  i 
I  the  data  collected  using  changes  in  the  intensity  of  a  j 
single  component  in  the  spectrum. 

I  I 

One  could  argue,  of  course,  that  there  are  less 
i  than  21  independent  estimates  of  the  spectrum.  This  is' 

I  certainly  possible,  but  two;  points  argue  against  it. 
j  The  first  is  that  only  six  or  seven  independent 
channels  across  the  200  to  5000  Hz  range  are  needed  in  ! 
j  order  to  acheive  the  level  'of  performance  found  in 
|  using  the  rippled  spectra,  j  Second,  if  the  different 
i  components  are  not  processed  independently,  then 
I  increasing  the  ripple  frequency  would  be  expected  to  , 
|  produce  increases  in  discrimination  thresholds, 
j  Rather,  we  find  that  ripple,  frequency  does  not  affect 
I  threshold  levels  over  the  range  of  values  tested,  and 
|  that  the  thresholds  obtained  using  complex,  rippled 
spectra  fall  short  of  those  expected  based  on  the  j 

results  of  discrimination  of  changes  in  a  single 
component  of  the  profile.  1 
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Detection  of  a  change  in  spectral  shape,  or  profile  analysis,  appears  to  be  mediated  by 
comparisons  across  widely  separated  frequency  “channels”  rather  than  by  local  comparisons 
among  adjacent  frequency  regions  [e.g.,  Green  et  al.,  J.  Acoust.  Soc.  Am.  73,  639-643 
( 1983)  ].  Two  experiments  were  conducted  in  order  to  determine  the  “resolution  bandwidth” 
of  these  channels.  The  first  involved  detection  of  an  increment  to  a  single  component  of  a 
multicomponent  background  as  a  function  of  the  number  of  components  in  the  background. 
Performance  improved  as  the  number  of  components  was  increased  from  3  to  21.  Further 
increases  yielded  poorer  performance  and  the  estimate  of  the  “resolution  bandwidth”  from 
these  data  suggests  that  this  poorer  performance  was  due  simply  to  masking.  The  second 
experiment  involved  discrimination  of  a  multicomponent  complex  having  a  flat  amplitude 
spectrum  from  one  having  a  sinusoidally  “rippled"  amplitude  spectrum.  The  latter  experiment 
yielded  somewhat  larger  estimates  of  the  “resolution  bandwidth"  than  did  the  former.  Finally, 
profile  analysis  was  investigated  under  a  dichotic  condition  that  precluded  peripheral  masking 
of  the  signal.  Our  results,  like  those  of  Green  and  Kidd  [J.  Acoust  Soc.  Am.  73,  1260-1265 
( 1983)  ],  suggest  that,  although  spectral  analysis  can  be  achieved  using  information  across 
ears,  performance  is  inferior  to  that  obtained  with  diotic  stimuli. 

PACS  numbers:  43.66.Fe,  43.66.Jh,  43.66.Rq,  43.66. Yw 


INTRODUCTION 


A  wide  variety  of  experimental  data  reported  in  pre¬ 
vious  publications  suggests  that  detection  of  a  change  in 
spectral  shape,  or  profile  analysis,  is  a  "global”  process.  The 
detection  process  appears  to  depend  upon  simultaneous 
comparisons  across  wide  separations  in  frequency,  i.e., 
across  widely  separated  “channels”  rather  than  on  local 
comparisons  among  adjacent  frequency  regions  (e.g.,  Green 
etal. ,  1983;  Green  etal.,  1984).  Consideration  of  the  nature 
of  this  process  has  led  to  two  related  questions.  The  first 
question  concerns  the  bandwidth  of  each  of  these  channels. 
The  second  question  concerns  how  information  is  combined 
across  the  individual  channels.  We  choose  to  refer  to  these 
channels  as  “resolution  bands”  rather  than  as  critical  bands 
because,  although  they  are  probably  closely  related,  it  is  un¬ 
clear,  a  priri,  whether  they  are  indeed  identical.  The  first 
two  experiments  we  will  report  address  these  questions.  T wo 
quite  different  experiments  were  employed  in  order  to  deter¬ 
mine  the  width  of  the  “resolution  bands.”  In  the  first,  we 
measured  listeners’  thresholds  for  an  increment  to  a  single 
component  of  a  multicomponent  background  as  a  function 
of  the  number  of  components  in  the  background.  In  the  sec¬ 
ond,  listeners  discriminated  between  a  flat,  multicomponent 
background  and  one  which  was  characterized  by  a  sinusoi¬ 
dally  “rippled”  spectrum.  Thresholds  were  determined  as  a 
function  of  the  number  of  “ripples”  or  the  “frequency”  of 
the  ripple. 

A  related  question  concerns  the  extent  to  which  the  de¬ 
tection  process  is  limited  to  peripheral  processes.  Certainly, 
some  aspects  of  profile  analysts  appear  to  suggest  some  cen¬ 


tral  comparison  because  increasing  the  density  of  compo¬ 
nents  that  define  the  profile  leads  to  improvements  in  perfor¬ 
mance  (e.g..  Green  et  al.,  1983;  Green  et  al.,  1984;  Green 
and  Mason,  1985).  However,  peripheral  aspects  are  also  ap¬ 
parent  because,  if  the  components  which  compose  the  multi- 
component  background  or  "standard”  are  spaced  so  closely 
that  several  components  fall  near  the  frequency  of  the  signal, 
a  decrement  in  performance  results  which  appears  to  be  due, 
at  least  in  part,  to  simple  masking  (Green  and  Mason, 
1985). 

In  almost  all  of  the  experiments  concerning  profile  anal¬ 
ysis  reported  to  date,  the  stimuli  have  been  presented  dioti- 
cally.  That  is,  the  stimuli  were  identical  at  each  ear.  In  the 
third  experiment,  we  wished  to  compare  performance  ob¬ 
tained  with  diotic  stimuli  to  that  obtained  when  the  stimuli 
were  presented  dichotically.  That  is,  the  profile  (except  for 
the  component  at  the  signal  frequency)  was  presented  to  one 
ear  and  the  component  to  which  the  signal  was  added  was 
presented  to  the  contralateral  ear.  Green  and  Kidd  ( 1983) 
also  used  this  dichotic  configuration  and  found  performance 
to  be  substantially  inferior  to  that  obtained  with  diotic  stim¬ 
uli.  However,  it  is  unclear  to  what  extent  this  result  was 
influenced  by  their  listeners  having  received  substantial  pri¬ 
or  training  with  the  diotic  presentation.  In  the  third  experi¬ 
ment,  we  again  attempted  to  determine  whether  profile  anal¬ 
ysis  can  be  achieved  when  the  stimuli  are  presented 
dichotically.  More  specifically,  we  wished  to  assess  ( 1 )  how 
efficiently  “profile”  information  could  be  integrated  across 
the  ears  and  (2)  the  form  of  the  function  relating  detection 
threshold  to  the  number  of  components  which  compose  the 
background  when  the  possibility  of  peripheral  masking  of 
the  signal  is  removed. 
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I.  EXPERIMENT  1— EFFECTS  OF  SPECTRAL  DENSITY 
A.  Procedure 

Multicomponent  complexes  consisting  of  3,  5,  11,  21, 
41,  or  81  components  with  logarithmic  frequency  spacing 
between  components  were  utilized.  For  each  complex,  the 
lowest  frequency  was  200  Hz;  the  highest  was  5  kHz. 

All  stimuli  were  generated  and  presented  via  a  PDP  1 1/ 
73  which  also  controlled  the  experimental  timing  and  collec¬ 
tion  of  responses.  The  stimuli  were  played  through  a  16-bit 
D/ A  at  a  sampling  rate  of  25  kHz  and  were  low-pass  filtered 
at  10  kHz.  The  duration  of  each  stimulus  was  100  ms  with 
10-ms  cos:  rise/decay  ramps.  The  stimuli  were  presented 
diotically  over  TDH-50  earphones. 

A  two-alternative,  forced-choice  procedure  was  used. 
Each  trial  consisted  of  two  100-ms  observation  intervals  sep¬ 
arated  by  500  ms.  Intervals  were  marked  by  a  visual  display 
at  the  listener’s  response  box.  Feedback  was  provided  for  200 
ms  after  the  listener  responded. 

During  one  observation  interval,  the  multicomponent 
background  was  presented  with  all  components  at  equal  am¬ 
plitude.  The  other  interval  contained  the  background  plus 
the  signal.  The  signal  consisted  of  an  in-phase  addition  to  a 
single  component  of  the  complex.  The  signal  occurred  with 
equal  a  priori  probability  in  the  first  or  second  interval. 

Three  different  frequencies  were  selected  for  the  signal: 
380,  1000,  and  2626  Hz  (except  in  the  case  of  the  five-com¬ 
ponent  complex  where  frequencies  of  447,  1000,  and  2236 
Hz  were  employed.  For  the  three-component  complex,  only 
one  frequency  of  the  signal,  1  kHz,  was  employed).  Different 
frequencies  of  the  signal  were  utilized  in  order  to  determine 
whether  the  resolution  bandwidth  depended  on  center  fre¬ 
quency;  e.g.,  the  bandwidth  might  be  a  constant  ratio  of  cen¬ 
ter  frequency  (signal  frequency).  If  this  were  so,  then  de¬ 
creases  in  performance  (presumably  due  to  masking) 
produced  by  increasing  spectral  density  ought  to  be  similar 
regardless  of  the  region  of  the  spectrum  which  contains  the 
signal. 

The  level  of  the  signal  was  varied  adaptively  in  order  to 
estimate  that  level  which  would  produce  70.7%  correct  ( Le¬ 
vitt,  1971 ).  The  level  was  decreased  by  4  dB  following  two 
correct  responses  and  increased  by  4  dB  following  one  incor¬ 
rect  response.  After  four  "reversals,”  this  step  size  was  re¬ 
duced  to  2  dB.  Threshold  was  defined  as  the  mean  of  the 
signal  level  across  all  reversals,  excluding  the  first  four.  Tri¬ 
als  were  run  in  blocks  of  50  and  each  run  produced  approxi¬ 
mately  12  to  16  reversals.  The  frequency  of  the  signal  was 
fixed  over  each  block  of  trials.  Twenty-four  estimates  of 
threshold  were  obtained  for  each  listener  and  condition.  The 
mean  of  these  estimates,  averaged  across  listeners,  is  report¬ 
ed  as  threshold. 

The  overall  level  of  the  stimuli  was  varied  over  a  20-dB 
range  in  1-dB  steps.  A  value  was  chosen  randomly  on  each 
and  every  presentation  in  order  to  preclude  the  listeners' 
basing  their  judgments  on  absolute  level  rather  than  on  the 
spectral  shape.  The  median  level  was  50  dB  SPL  per  compo¬ 
nent.  The  dependent  variable  (threshold)  is  the  ratio  in  dB 
of  the  level  of  the  signal  ( the  size  of  the  in-phase  addition )  to 
the  level  of  the  corresponding  component  in  the  back- 
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ground.  For  example,  if  the  level  of  the  signal  were  equal  to 
the  level  of  the  component  in  the  background,  then  we  say 
the  signal-to-background  ratio  is  0  dB. 

Five  paid  observers  with  normal  hearing  participated  in 
this  experiment. 

B.  Results  and  discussion 

Figure  1  contains  the  data  obtained  when  the  frequency 
of  the  signal  was  1  kHz.  The  number  of  components  in  the 
profile  is  plotted  logarithmically  along  the  abscissa;  signal 
threshold  in  dB  is  displayed  along  the  ordinate.  Each  point 
represents  the  mean  of  the  thresholds  obtained  from  the  five 
listeners.  The  error  bars  represent  the  mean  of  the  standard 
errors  computed  for  the  individual  listeners.  The  solid  lines 
represent  our  theoretical  predictions  and  will  be  discussed  in 
detail  later.  The  data  indicate  that  as  the  number  of  compo¬ 
nents  is  increased  from  3  to  21,  threshold  decreases  mono- 
tonically  (the  signal  becomes  more  detectable)  from  about 
-  1 1  to  -  20  dB. 

As  the  number  of  components  is  increased  beyond  21, 
threshold  increases  monotonically  to  about  —  8  dB  for  an 
8 1  -component  complex.  The  sharp  minimum  in  the  function 
at  21  components  is  also  characteristic  of  the  individual 
data.  These  data  are  entirely  consistent  with  those  of  earlier 
investigations  (Green  and  Mason,  1985;  Green  era/.,  1983, 
1984). 

As  mentioned  in  the  introduction,  these  trends  can  be 
explained  in  a  rather  straightforward  manner.  Assume  that 
the  listener  detects  the  presence  of  the  signal  by  comparing 
the  relative  level  in  the  resolution  band  containing  the  fre¬ 
quency  of  the  signal  to  the  level  of  the  remaining  bands 
across  the  spectrum.  As  the  number  of  components  which 
compose  the  profile  is  increased  from  3  to  21,  additional 
independent  bands  or  channels  contribute  to  an  estimate  of 
the  mean  “level”  of  the  profile.  As  the  number  of  compo¬ 
nents  and  thus  the  density  of  the  profile  is  increased  beyond 
21,  additional  components  fall  into  the  “resolution  band”  of 
the  signal.  The  addition  of  the  signal  then  produces  a  rela- 


FIG  1  Threshold  for  deteclion  of  an  increment  to  the  1 -kHz  “signal"  com¬ 
ponent  of  a  multicomponent  background  as  a  function  of  the  number  of 
components  in  the  background  Squares  represent  thresholds  averaged 
across  listeners.  Error  bars  represent  the  mean  of  the  standard  errors  com¬ 
puted  for  individual  listeners.  Solid  lines  represent  theoretical  predictions  as 
discussed  in  the  text. 
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tively  smaller  increase  in  power  within  the  band  and  thus 
becomes  more  difficult  to  detect.  According  to  these  notions, 
the  monotonic  decrease  in  threshold  in  the  left-hand  portion 
of  Fig.  1  is  due  to  integration  of  information  across  bands. 
The  monotonic  increase  in  the  right-hand  portion  is  due  to 
simple  masking.  We  will  now  examine  these  explanations 
more  formally. 

The  monotonic  decrease  in  threshold  in  the  left-hand 
portion  of  the  figure  was  modeled  by  assuming  that  as  com¬ 
ponents  are  added  to  the  complex,  they  yield  additional  inde¬ 
pendent  estimates  of  the  level  of  the  profile.  Under  this  as¬ 
sumption,  threshold  would  be  expected  to  decrease  at  a  rate 
of  1.5  dB  per  doubling  of  the  number  of  components,  that  is, 
with  the  yfn.  A  line  having  this  slope  was  fit  to  the  data  by  eye 
and  appears  to  predict  the  decrease  in  threshold  quite  well. 

For  increases  in  the  number  of  components  beyond  21, 
an  intuitively  appealing  explanation  is  that  simple  masking 
causes  an  increase  in  threshold.  If  the  density  of  the  complex 
is  such  that  the  addition  of  components  causes  one  or  more 
to  fall  within  a  common  “resolution  band,”  then  they  pro¬ 
vide  no  additional  information  as  to  the  level  of  the  profile. 
Rather,  their  presence  causes  the  signal  to  produce  a  smaller 
relative  increase  in  power  within  the  resolution  band  and 
thus  a  less  effective  signal.  Because  the  minimum  of  the  func¬ 
tion  lies  at  21  components,  this  value  appears  to  be  a  good 
estimate  of  the  spectral  density  at  which  this  occurs. 

The  increase  in  threshold  in  the  right-hand  portion  of 
Fig.  1  was  modeled  in  the  following  manner.  For  simplicity, 
we  assumed  that  for  the  21-component  complex,  only  the  1- 
kHz  signal  component  lies  within  the  resolution  band.  Next, 
we  calculated  the  increase  in  power  within  the  band  pro¬ 
duced  by  the  signal  at  threshold  for  this  condition.  The  reso¬ 
lution  band  was  modeled  as  a  triangular  filter  symmetric  in 
log  space  whose  bandwidth  we  wished  to  determine.  For  the 
41-  and  81-component  complexes,  multiple  components 
would,  presumably,  fall  within  this  resolution  band.  As  a 
function  of  the  bandwidth  of  the  filter,  we  calculated  the 
level t)f  the  signal  necessary  to  produce  a  constant  increment 
in  power  within  the  band,  i.e.,  the  same  increase  in  power 
produced  by  the  signal  at  threshold  for  the  21 -component 
complex.  The  predicted  thresholds  are  plotted  as  the  solid 
line  in  the  right-hand  portion  of  the  figure  for  a  triangular 
filter  extending  from  852-1174  Hz.  Our  predicted  thresh¬ 
olds  lie  remarkably  close  to  the  data.  Most  important,  the 
equivalent  rectangular  bandwidth  of  our  filter,  162  Hz,  com¬ 
pares  favorably  with  accepted  estimates  of  the  critical  band 
around  1  kHz. 

The  reader  may  be  puzzled  ( as  were  we)  that  thresholds 
increase  at  a  rate  of  about  6  dB  per  doubling  of  the  number  of 
components  rather  than  at  3  dB  per  doubling.  Note  that  we 
have  calculated  the  size  of  the  increment  which  must  be  add¬ 
ed  in-phase  to  the  single  component  at  the  frequency  of  the 
signal  in  order  to  produce  a  constant  increment  in  power 
within  the  band.  Our  calculations  reveal  that  as  the  number 
of  components  is  increased  beyond  21  and  multiple  compo¬ 
nents  begin  to  fall  within  the  resolution  band,  the  slope  of  the 
line  relating  threshold  to  the  number  of  components  is  pre¬ 
dicted  to  be  about  5.5  dB  per  doubling,  very  close  to  that 
actually  obtained.  (The  predicted  slope  eventually  asymp¬ 


totes  to  3  dB  per  doubling  but  only  for  very,  very  large 
numbers  of  components,  i.e.,  10  000  or  more!) 

In  summary,  the  data  in  Fig.  I  are  described  well  by 
considering  the  improvement  in  performance  as  the  number 
of  components  is  increased  from  3  to  2 1  to  be  due  to  integra¬ 
tion  of  information  across  independent  bands  or  channels, 
and  the  decrement  in  performance  for  increases  beyond  2 1  to 
be  due  to  masking.  To  the  degree  that  the  data  depart  from 
these  predictions,  they  do  so  largely  because  the  monotonic 
decrease  in  threshold  does  not  exhibit  a  uniform  slope.  Rath¬ 
er,  as  noted  above,  there  appears  to  be  a  sharp  drop  between 
1 1  and  21  components.  At  present,  we  have  no  satisfactory 
explanation  for  this  trend  which  is  also  exhibited  in  the  indi¬ 
vidual  listener’s  data. 

Figure  2  is  similar  to  Fig.  1  and  contains  the  data  ob¬ 
tained  for  frequencies  of  the  signal  of  380  Hz,  1  kHz,  and 
2.626  kHz.  The  parameter  of  the  plot  is  the  frequ*“"''y  of  the 
signal.  Three  of  the  listeners  from  the  original  group  of  five 
participated  in  this  portion  of  the  experiment.  The  1-kHz 
data  have  been  replotted  from  Fig.  1.  Each  point  represents 
(he  mean  of  their  thresholds.  Note  that  in  the  case  of  the  five- 
component  background,  the  low  and  high  frequencies  em¬ 
ployed  were  actually  447  Hz  and  2.236  kHz,  respectively. 

Curiously,  the  thresholds  obtained  with  signals  above 
and  below  1  kHz  do  not  exhibit  a  sharp  minimum  at  21 
components.  However,  for  all  three  frequencies  of  the  signal, 
thresholds  increased  rapidly  when  the  number  of  compo¬ 
nents  was  increased  beyond  21.  This  finding  suggests  that 
the  width  of  the  resolution  band  is  a  constant  ratio  of  center 
frequency  over  the  range  of  values  tested.  Note  that  the 
thresholds  for  the  380-Hz  and  2.626-kHz  signals  are  elevat¬ 
ed  relative  to  those  obtained  when  the  signal  was  added  to 
the  central,  1-kHz  component.  The  average  increase  in 
threshold  relative  to  the  1-kHz  signal  is  4.7  and  6.1  dB  for 
the  380-Hz  and  2.626-kHz  signals,  respectively. 

This  elevation  of  threshold  for  high-  or  low-frequency 
signals  is  consistent  with  previous  studies.  Green  and  Mason 
( 1985),  who  used  stimuli  similar  to  those  employed  here. 


FIO.  2.  Similar  lo  Fig.  1 .  The  parameter  of  the  plot  is  the  frequency  of  the 
signal;  triangles:  380  Hi;  squares:  1  kHz,  circles:  2b2b  Hz.  Note  that  for  the 
five-component  background,  signal  frequencies  of  447,  HXJO,  and  223b  Hz 
were  employed. 
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also  observed  that  detection  thresholds  were  lowest  when 
the  signal  was  added  to  the  central,  1-kHz  component  of  a 
21-component  complex.  In  addition,  a  more  recent  investi¬ 
gation  in  our  laboratory  (Green  et  al.,  1987)  also  revealed 
this  trend.  However,  the  results  of  this  latter  study  also  indi¬ 
cated  that  ( I )  when  listeners  receive  substantial  practice, 
thresholds  for  low-frequency  signals  are  extremely  close  to 
those  obtained  with  a  1-kHz  signal,  differing  by  only  ahout  3 
dB  and  (2)  above  1  kHz,  threshold  increases  slowly  with 
frequency  reaching  -t-  6  dB  relative  to  that  obtained  at  1 
kHz.  The  data  in  Fig.  2  are  in  agreement  with  these  findings 
in  that  thresholds  are,  in  general,  lowest  for  the  1-kHz  signal 
and  highest  for  the  2.626-kHz  signal. 

II.  EXPERIMENT  2— VARIATION  OF  SINUSOIDAL 
RIPPLE 

The  purpose  of  this  experiment  was  to  provide  an  addi¬ 
tional,  independent  estimate  of  the  resolution  bandwidth.  In 
this  experiment,  the  signal  produced  a  Sinusoidal  change  in 
the  amplitudes  of  the  flat  profile  rather  than  an  increment  to 
a  single  component.  That  is,  the  addition  of  the  signal  pro¬ 
duced  what  we  refer  to  as  a  “rippled”  spectrum.  We  wished 
to  investigate  how  detection  would  be  affected  as  a  function 
of  the  number  of  ripples. 

The  results  of  a  previous  investigation  (Green  et  al., 
1987)  showed  that  thresholds  were  virtually  constant  as  the 
"frequency"  or  number  of  ripples  was  varied  from  one  to  ten. 
We  reasoned  that  if  the  frequency  of  the  ripple  was  increased 
further,  a  point  would  be  reached  where  the  relatively  high- 
frequency  sinusoidal  variation  in  amplitude  would  not  be 
detectable  because  individual  “cycles”  would  fall  within  sin¬ 
gle  resolution  bands.  The  “internal”  spectrum  would  thus  be 
flat  and  indistinguishable  from  the  background.  That  is,  we 
expected  the  data  to  exhibit  a  low-pass  characteristic.  The 
point  at  which  sensitivity  begins  to  decline  would  indicate 
the  spacing  at  which  a  peak  and  valley  of  the  ripple  begin  to 
fall  within  a  single  band  and,  hence,  would  provide  an  esti¬ 
mate  of  the  resolution  bandwidth. 

A.  Procedure 

The  standard  waveform  was  a  161-component  flat  spec¬ 
trum  that  ranged  in  frequency  from  200-5000  Hz.  The 
successive  components  were  spaced  equally  on  a  logarithmic 
scale.  The  ratio  between  successive  frequencies  was  1.0203; 
there  were  34.5  components  per  octave.  The  addition  of  the 
signal  produced  a  power  spectrum  whose  amplitude  varied 


sinusoidally  as  a  function  of  the  logarithm  of  frequency.  Fig¬ 
ure  3  shows  this  manipulation  graphically  for  a  21 -compo¬ 
nent  complex.  The  first  panel  shows  a  single  cycle  of  sinusoi¬ 
dal  variation  in  amplitude  over  the  spectrum;  the  next  one 
shows  two  cycles  of  amplitude  variation;  and,  finally,  the  last 
panel  shows  ten  cycles,  the  greatest  variation  that  can  be 
achieved  with  2 !  components  because  alternate  components 
increase  and  decrease  in  amplitude. 

Specifically,  the  “signal"  waveform  was  produced  by 
setting  the  amplitude  of  successive  components  a(i)  accord¬ 
ing  to  the  following  equation: 

a(i)  =  s\n[2Trk(i/M)\  i—  1,2  ,...,A/, 

where  i  is  the  number  of  the  component,  ranging  in  this  case 
from  1  to  161,  a(i)  is  the  amplitude  of  the  ith  component  of 
the  signal  spectrum,  and  k  is  frequency  of  the  ripple.  Recall 
that  the  first  component, ;  =  1,  corresponds  to  a  frequency 
of  200  Hz,  and  the  last  component,  /  =  1 6 1 ,  corresponds  to  a 
frequency  of  5000  Hz. 

The  “depth"  of  the  ripple  resulting  from  the  addition  of 
the  signal  to  the  standard  waveform  depends  upon  the  ratio 
of  the  amplitudes  of  the  signal  components  to  those  of  the 
standard’s  equal-amplitude  components.  The  depth  of  the 
ripple  is,  of  course,  monotonicaily  related  to  the  signal-to- 
standard  ratio.  We  scaled  the  amplitude  of  this  “signal”  and 
added  each  component  in-phase  ( respecting  sign )  to  the  cor¬ 
responding  component  of  the  flat  standard  spectrum  to  pro¬ 
duce  the  change  in  the  spectrum,  such  as  that  shown  in  Fig. 
3.  In  that  figure,  the  signal  amplitude  is  about  20%  of  the 
standard  amplitude. 

It  should  be  noted  that  by  constructing  the  signal  in  the 
manner  described  above,  the  root-mean-square  (rms)  of  the 
amplitudes  across  components  is  independent  of  the  fre¬ 
quency  of  the  ripple  k  because  the  161  values  for  any  set  of 
a{i)  are  the  same;  only  their  order  within  the  set  has  been 
changed.  If  the  maximum  value  for  a  ( /')  is  1 ,  the  rms  value  is 
0.707.  We  refer  to  the  signal-to-standard  ratio  as  the  rms 
signal  amplitude  to  the  amplitude  of  any  component  of  the 
standard. 

Note  that  one  disadvantage  of  this  technique  is  that  we 
cannot  determine  if  the  decline  in  detection  performance  as 
the  number  of  ripples  is  increased  is  dominated  by  any  spe¬ 
cific  frequency  region(s).  The  data  from  experiment  1  sug¬ 
gest  that  the  resolution  bandwidth  is,  roughly,  a  constant 
proportion  of  center  frequency.  If  this  were  true,  then  be¬ 
cause  our  spectra  are  rippled  sinusoidally  as  a  function  of  the 
logarithm  of  frequency,  the  predicted  decline  in  detection 
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FIG.  3.  Three  different  frequencies  of  ripple  k,  using  sinusoidal  variation.  The  signal  amplitude  at  each  component  frequency  is  given  by  Eq.  ( I )  and  is 
added  to  the  standard  with  a  relative  amplitude  of  about  1/5  that  of  the  standard. 
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performance  as  the  frequency  of  the  ripple  increases  would 
be  mediated  by  a  uniform  loss  of  resolution  across  the  spec¬ 
trum.  In  any  case,  our  estimate  of  the  resolution  bandwidth 
yielded  by  this  procedure  must  be  a  relative  one,  in  effect,  an 
estimate  of  Q. 

The  five  observers  who  participated  in  experiment  1  also 
participated  in  this  experiment. 

B.  Results  and  discussion 

The  data  are  displayed  in  Fig.  4  where  the  number  of 
ripples  imposed  on  the  flat  spectrum  is  plotted  logarithmi¬ 
cally  along  the  abscissa.  The  threshold  of  the  signal  is  dis¬ 
played  in  the  usual  manner  along  the  ordinate.  Recall  that 
we  have  reported  the  rms  value  as  our  measure  of  the  ampli¬ 
tude  of  the  signal. 

The  average  data  obtained  from  three  of  the  five  listen¬ 
ers  (group  1)  are  plotted  as  squares;  triangles  represent  the 
average  data  from  the  remaining  two  (group  2).  Error  bars 
represent  the  mean  of  the  standard  errors  computed  for  the 
individual  listeners  whose  data  are  displayed.  Data  averaged 
across  all  listeners  are  plotted  along  the  solid  line. 

The  data  in  Fig.  4  indicate  that  thresholds  remain  fairly 
constant  as  the  number  of  ripples  is  increased  from  1  to  10,  a 
result  which  was  also  obtained  by  Green  etal.  ( 1987).  As  the 
“frequency”  of  the  spectral  ripple  is  increased  beyond  10  to 
80,  thresholds  increased  monotonically  for  group  1  at  a  rate 
of  about  6  dB/oct,  reaching  about  —  10.5  dB  at  80  ripples. 
Thresholds  for  the  two  listeners  in  group  2  are  higher  than 
those  of  group  1  over  the  entire  range  of  values  tested.  In 
addition,  they  do  not  increase  monotonically  between  10  and 
80  ripples.  Rather,  the  thresholds  obtained  in  the  40-ripple 
condition  are  higher  than  those  obtained  with  80  ripples. 

At  first,  we  thought  this  anomaly  was  either  due  to  ran¬ 
dom  fluctuations  in  the  data  or  was  artifactual.  We  reran 
several  of  the  conditions  after  generating  new  signals  for  the 
40- ripple  condition  and  found  that  the  elevation  in  threshold 
at  40  ripples  persisted  for  these  two  listeners.  Finally,  we  ran 


FIG.  4.  Detection  threshold  as  a  function  of  the  number  of  sinusoidal  "rip¬ 
ples”  imposed  on  the  flat  spectrum  by  the  signal.  Triangles:  average  data  for 
group  1;  open  circles:  average  data  for  group  2:  solid  line:  average  data  for 
all  listeners.  Error  bars  represent  the  mean  of  the  standard  errors  computed 
for  individual  listeners. 
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a  third  group  of  three  listeners  which  included  one  naive 
listener  and  two  highly  trained  listeners  employed  in  our 
laboratory.  Interestingly,  both  the  naive  listener  as  well  as 
one  of  the  highly  trained  listeners  exhibited  a  nonmonotoni¬ 
city  similar  to  that  displayed  for  group  2  in  Fig.  4.  Their 
thresholds  for  40  ripples  were  higher  than  those  for  80.  We 
find  this  trend,  which  occurred  in  four  of  eight  listeners  test¬ 
ed,  to  be  most  perplexing.  We  are  unable  to  offer  a  satisfac¬ 
tory  explanation  for  its  existence. 

In  general,  the  data  of  Fig.  4  exhibit  the  expected  low- 
pass  characteristic  described  earlier.  When  the  number  of 
ripples  is  increased  from  one  to  ten,  thresholds  remain  essen¬ 
tially  constant.  As  the  number  of  ripples  is  increased  beyond 
ten,  thresholds  increase.  We  attempted  to  use  the  data  ot  Fig. 
4  to  determine  the  “low-pass  cutoff”  of  the  function  and 
ultimately  an  estimate  of  the  resolution  bandwidth. 

We  used  several  procedures  to  fit  two  straight  lines  to 
each  group’s  data  and  took  their  respective  intersections  as 
estimates  of  the  “corner-"  or  3-dB-down-point  of  the  im¬ 
plied  resolution  band.  Depending  on  the  details  of  the  proce¬ 
dure  used  to  fit  the  data,  the  intersection  occurred  between 
10  and  14  ripples.  Considering  that  our  components  span 
4.64  octaves,  a  3-dB  point  of  14  ripples  implies  that  the  reso¬ 
lution  band  spans  0.33  octaves.  At  1  kHz,  the  resolution 
band  would  be  about  230  Hz  wide.  Similar  calculations  for  a 
cutoff  of  ten  ripples  yield  a  bandwidth  of  about  320  Hz. 
There  is  a  considerable  discrepancy  between  these  estimates 
of  the  resolution  bandwidth  and  that  of  160  Hz  obtained 
from  the  data  of  experiment  1.  In  addition,  they  are  quite  a 
bit  larger  than  usual  estimates  of  the  critical  band  in  this 
region. 

It  should  be  noted  that,  for  the  data  of  Fig.  4,  a  cutoff  of 
slightly  greater  than  20  ripples  would  have  had  to  have  been 
observed  in  order  for  the  estimate  of  the  resolution  band¬ 
width  to  match  that  of  160  Hz  obtained  from  experiment  1. 
No  reasonable  fit  to  the  data  of  Fig.  4  would  yield  such  an 
estimate.  Thus  we  are  confident  that  the  discrepancy  in  our 
estimates  of  the  resolution  bandwidth  is  not  a  result  of  the 
particular  details  of  the  procedures  we  employed  to  fit  the 
data. 

On  the  other  hand,  if  the  resolution  bandwidth  was  not  a 
constant  proportion  of  center  frequency,  then  our  single  esti¬ 
mate  of  the  resolution  bandwidth  of  230  Hz  around  1000  Hz 
which  corresponds  to  a  Q  of  4. 34  could  be  dominated  by  any 
frequency  region  which  was  characterized  by  a  proportion¬ 
ately  small  resolution  bandwidth.  However,  because  the 
data  from  experiment  1  suggest  that  the  resolution  band¬ 
width  is  roughly  a  constant  proportion  of  center  frequency 
and,  because  our  spectra  are  rippled  sinusoidally  as  a  func¬ 
tion  of  the  logarithm  of  frequency  and,  hence,  should  occupy 
equal  spatial  intervals  along  the  basilar  membrane,  we  are 
reasonably  sure  that  the  decline  in  detection  performance  as 
the  frequency  of  the  ripple  increases  is  accompanied  by  a 
uniform  loss  of  resolution  across  the  spectrum. 

One  possible,  but  unappealing,  explanation  for  our  dis¬ 
parate  estimates  of  the  resolution  bandwidth  is  that  the  de¬ 
tection  processes  employed  by  'he  listeners  to  perform  the 
tasks  of  experiments  1  and  2  are  sufficiently  different  as  to  be 
mediated  by  different  resolution  bandw  idths.  It  is  interesting 
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to  note  that  our  stimuli  with  rippled  spectra  are  in  some 
sense  similar  to  those  employed  by  others  (Bilsen  and 
Ritsma,  1970;  Yost  and  Hill,  1978)  who  investigated  the 
discrimination  of  flat  spectra  from  those  with  a  linear  spec¬ 
tral  ripple.  Furthermore,  the  thresholds  obtained  in  these 
studies  are  similar  to  ours.  It  is  quite  possible  then  that  our 
listeners  employed  pitch  cues  similar  to  those  employed  by 
listeners  in  these  previous  studies  in  order  to  detect  the  pres¬ 
ence  of  the  ripple.  Such  cues  would  not  be  expected  to  be 
available  in  the  case  of  an  increment  to  a  single  component 
(experiment  1 ).  This  suggests  one  way  in  which  the  tasks  of 
experiments  1  and  2  may  be  different. 

III.  EXPERIMENT  3— DIOTIC/OICHOTIC 
COMPARISONS 
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The  results  of  experiments  1  and  2  strongly  suggest  that 
detection  of  a  change  in  spectral  shape  is  limited  by  peri¬ 
pheral  processes  which  produce  peripheral  masking.  We 
noted  in  the  introduction  (as  did  Green  and  Kidd,  1983) 
that  certain  aspects  of  profile  analysis  appear  to  suggest 
some  central  comparison  process(es).  The  present  experi¬ 
ment  was  designed  to  investigate  profile  analysis  in  the  ab¬ 
sence  of  peripheral  masking  of  the  signal  in  an  effort  to  assess 
more  adequately  the  role  of  central  processes. 

A.  Procedure 

The  procedure  was  the  same  as  that  employed  in  experi¬ 
ment  1  with  a  few  important  exceptions.  All  stimuli  were 
generated  and  presented  via  an  IBM-PC  which  also  con¬ 
trolled  the  experimental  timing  and  collection  of  responses. 
The  stimuli  were  played  through  a  1 2-bit  D/A  at  a  sampling 
rate  of  14.286  kHz  and  were  low-pass  filtered  at  10  kHz.1 
The  duration  of  each  stimulus  was  200  ms  with  a  5-ms,  cos2 
rise/decay  ramp.  The  frequency  of  the  signal  was  1  kHz. 
Detection  was  measured  as  a  function  of  the  number  of  com¬ 
ponents  in  the  multicomponent  background. 

Each  trial  consisted  of  two  500-ms  observation  intervals 
separated  by  300  ms.  The  first  250  ms  of  each  observation 
interval  contained  a  visual  warning  display  on  the  IBM’s 
monitor.  Feedback  was  provided  for  400  ms.  Eighteen  esti¬ 
mates  of  threshold  were  obtained  for  each  listener  and  condi¬ 
tion. 

Signals  were  presented  either  diotically  as  in  experiment 
t,  or  dichotically.  When  the  dichotic  configuration  was  em¬ 
ployed,  the  “flat"  profile  (except  for  the  component  at  the 
signal  frequency)  was  presented  to  one  ear  and  the  compo¬ 
nent  to  which  the  signal  was  added  was  presented,  in  isola¬ 
tion,  to  the  contralateral  ear.  Three  paid  observers  with  nor¬ 
mal  hearing  (who  had  not  participated  in  the  previous 
experiments)  participated  in  this  experiment. 

B.  Results  and  discussion 

Figure  5  displays  the  results  for  the  diotic  and  dichotic 
conditions.  The  data  for  the  diotic  conditions  are  quite  simi¬ 
lar  to  those  presented  in  Fig.  1  and,  like  those  data,  exhibit  a 
minimum  at  21  components.  The  dichotic  thresholds  are 
larger  than  the  diotic  for  all  numbers  of  components  tested 
and  do  not  exhibit  any  pronounced  minimum.  Indeed,  they 


FIG.  5.  Threshold  for  deteclion  of  an  increment  to  the  1-kHz  “signal"  com¬ 
ponent  of  a  multicomponent  background  as  a  function  of  the  number  of 
components  in  the  background.  The  parameter  of  the  plot  is  the  interaural 
configuration  of  the  signal.  Triangles  and  squares  represent  diotic  and  di¬ 
chotic  conditions,  respectively. 


show  little  variation,  having  a  mean  of  —  10.4  dB  and  rang 
ing  from  —  9.0  dB  at  three  components  to  —  12.3  dB  at  5 
components.  It  is  important  to  note,  that  all  of  the  dichotic 
thresholds  displayed  in  Fig.  5  are  superio*  to  that  which 
would  be  expected  had  the  listeners  ignored  the  profile  infor¬ 
mation  and  used  only  the  information  in  the  contralateral 
ear  to  which  the  single  sinusoidal  component  was  presented. 
If  such  were  the  case  and  the  listener  ba„ed  his/her  decision 
on  the  interval  containing  the  more  intense  tone,  then  one 
could  calculate  that  the  expected  threshold,  given  the  20-dB 
random  variation  in  overall  level,  would  be  —  3  dB  (Green, 
1986).  The  average  dichotic  threshold  is  about  7  dB  lower 
than  this  value. 

On  the  one  hand,  one  would  not  expect  the  dichotic 
thresholds  to  increase  as  the  number  of  components  in  the 
profile  is  increased  beyond  some  critical  value  because  the 
“flat”  profile  (except  for  the  component  at  the  signal  fre¬ 
quency)  was  presented  to  one  ear  and  the  component  to 
which  the  signal  was  added  was  presented,  in  isolation,  to  the 
contralateral  ear.  Thus  there  was  no  opportunity  for  peri¬ 
pheral  masking  to  degrade  performance  as  was  true  for  the 
diotic  conditions,  and  the  data  are  consistent  with  such  ex¬ 
pectations. 

On  the  other  hand,  if  profile  information  could  be  com¬ 
bined  across  ears  without  loss,  as  the  number  of  components 
increased  from  some  small  number,  dichotic  thresholds 
would  be  expected  to  decrease  in  a  manner  similar  to  that 
observed  for  the  diotic  thresholds.  Furthermore,  because 
this  process  would  not  be  limited  by  masking,  the  dichotic 
thresholds  could,  theoretically,  decline  to  some  asymptotic 
value  at  or  below  that  obtained  in  the  most  sensitive  of  diotic 
conditions.  This,  clearly,  was  not  the  case. 

That  the  dichotic  thresholds  do  not  decline  as  the  num¬ 
ber  of  components  is  increased  beyond  five  appears  to  sug¬ 
gest  that  there  is  some  limit  to  the  extent  to  which  profile 
information  can  be  combined  across  ears,  which  renders 
further  increases  in  the  number  of  components  ineffective. 
Why  this  is  so  remains  obscure. 
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In  summary,  the  data  of  Fig.  5  do  not  appear  to  consti¬ 
tute  a  strong  test  of  the  extent  to  which  profile  analysis  is 
mediated  by  peripheral  and/or  central  processes.  It  may  be 
the  case  that  if  central  processes  are  involved,  then  they  are 
limited  by  the  efficiency  with  which  information  across  the 
ears  can  be  combined,  a  process  that,  presumably,  precedes 
the  determination  of  spectral  shape 

We  should  observe  that  our  dichotic  thresholds  are 
about  7-9  dB  lower  than  those  obtained  by  Green  and  Kidd 
( 1983),  who  utilized  a  similar  presentation  with  3-  and  21- 
component  backgrounds.  These  investigators  also  found 
performance  with  dichotic  stimuli  to  be  substantially  inferi¬ 
or  to  that  obtained  with  diotic  stimuli.  However,  it  is  unclear 
to  what  extent  the  discrepancy  between  the  dichotic  thresh¬ 
olds  obtained  by  Green  and  Kidd  and  those  obtained  by  us 
was  influenced  by  their  listeners  having  received  substantial 
prior  experience  with  the  diotic  presentations. 

We  also  considered  how  binaural  interaction  may  have 
influenced  otlr  dichotic  thresholds.  From  the  viewpoint  of 
the  majority  of  models  of  binaurarhearing  (see,  for  example, 
Colburn  and  Durlach,  1978),  one  prerequisite  for  binaural 
interaction  is  the  presence  of  energy  in  corresponding  fre¬ 
quency  Ixu.ds  or  channels  at  the  two  ears;  i.e.,  neural  events 
can  only  be  compared  across  pairs  of  fibers  with  similar 
characteristic  frequencies. 

Recall  that  for  our  dichotic  stimuli,  the  1-kHz  signal 
component  was  absent  from  the  ear  which  contained  the 
multicomponent  background.  To  the  extent  that  compo¬ 
nents  in  the  profile  which  surrounded  the  1-kHz  region  fell 
within  a  common  “binaural  critical  band”  with  the  1-kHz 
component  in  the  opposite  ear,  binaural  interaction  could 
occur.  If  such  were  the  case,  the  listener  could,  theoretically, 
detect  the  presence  of  the  signal  by  comparing  the  interaural 
intensitive  disparities  (IIDs)  in  the  two  intervals.  During  a 
signal  interval,  the  IID  would  favor  (relatively)  the  ear 
which  contained  the  1-kHz  component  to  which  the  incre¬ 
ment  was  added.  This  logical  possibility  exists  regardless  of 
the  fact  that  the  waveforms  in  each  1-kHz  peripheral  filter 
would  not  be  highly  correlated.  Thresholds  for  IIDs  have 
been  shown  to  be  as  small  as  0.4  dB  (  —  26.5  dB,  using  our 
dependent  measure)  with  interaurally  uncorrelated  signals 
(Nuetzel,  1982). 

The  likelihood  that  a  listener  could  utilize  such  a  cue 
would  increase  as  the  number  of  components  in  the  profile, 
and  thus  the  proximity  of  components  to  the  1-kHz  region 
increased.  For  example,  in  the  case  of  the  three-component 
complex,  the  “profile”  channel  contained  the  frequencies 
200  Hz  and  5  kHz.  There  is  little  possibility  that  either  of 
these  components  could  interact  binaurally  with  the  1-kHz 
component  in  the  opposite  ear  regardless  of  the  value  of  the 
“binaural  critical  band”  one  chooses  to  accept  ( e.g.,  Bour¬ 
bon,  1966;  Sever  and  Small,  1979;  Sondhi  and  Guttman, 
1966).  In  contrast,  for  the  81 -component  complex,  the  com¬ 
ponents  closest  to  1  kHz  in  the  “profile”  ear  are  960  and  104 1 
Hz,  which  lie  within  accepted  values  of  a  critical  bandwidth. 

The  data  of  Fig.  5  do  not  support  the  notion  that  listen¬ 
ers  were  utilizing  binaural  cues  because  the  number  of  com¬ 
ponents  that  compose  the  profile  appears  to  have  little,  if 
any,  systematic  effect  on  the  thresholds. 
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IV.  GENERAL  DISCUSSION 

We  have  stated  that  profile  analysis,  or  the  detection  ofa 
change  in  spectral  shape,  appears  to  be  a  "global”  process 
that  relies  on  simultaneous  comparisons  across  a  wiue  range 
of  independent  frequency  channels.  Experiments  l  and  2 
were  designed  to  yield  independent  estimates  of  the  band¬ 
width  of  these  channels,  what  we  have  called  the  "resolution 
bandwidth."  The  estimate  of  1 60  Hz  around  1  kHz,  obtained 
in  experiment  1,  is  consistent  with  the  accepted  values  of  a 
critical  bandwidth,  and  thus  supports  our  contention  that 
the  decrements  in  performance  observed  with  increasing 
spectral  density  (Fig.  1 )  are,  in  fact,  due  largely  to  masking. 

The  indirect  estimate  of  the  resolution  bandwidth 
around  1  kHz,  yielded  by  the  data  of  experiment  2,  was  some 
1.5  to  2  times  larger  than  mat  obtained  in  experiment  1.  This 
disparity  leads  us  to  speculate  that  the  tasks  of  detecting  an 
increment  to  a  single  component  in  a  “flat,”  multicompon¬ 
ent  background  and  that  of  discriminating  a  “rippled"  from 
a  “flat”  spectrum  differ  in  ways  that  are  quite  complex.  One 
may  not  be  able  to  simply  extrapolate  from  one  to  the  other. 

This  finding  is  not  unique  and,  in  one  respect,  these  data 
are  consistent  with  those  obtained  in  an  earlier  investigation 
(Green,  1986).  In  that  study,  we  reported  that  we  were  un¬ 
able  to  predict  listeners’  ability  to  discriminate  flat  from  rip¬ 
pled  spectra  on  the  basis  of  their  sensitivity  to  changes  in  the 
intensity  of  a  single  component  unless  we  assumed  that  per¬ 
formance  in  the  former  task  was  mediated  by  a  small  number 
of  widely  spaced  channels  across  the  spectrum,  a  conclusion 
which  is  entirely  consistent  with  the  data  obtained  in  the 
present  study.  One  way  in  which  an  effectively  small  number 
of  channels  may  be  produced  is  if  the  channels  are  not  inde¬ 
pendent  but  rather  are  correlated  in  a  manner  suggested  by 
Durlach  et  al.  (1986).  Regardless  of  the  mechanism,  it  is 
difficult  to  understand  why  the  number  of  effective  channels 
for  detecting  the  presence  of  spectral  ripple  is  smaller  than 
that  which  appears  to  be  utilized  when  the  task  involves  de¬ 
tecting  increments  to  a  single  component.  The  data  from 
experiment  1  show  that  thresholds  continue  to  decline  as  the 
number  of  components  is  increased  from  3  to  2 1 ,  a  fact  which 
suggests  that,  for  that  task,  there  are  many  more  than  five  or 
so  effective  bands. 

It  is  difficult  to  understand  why  the  effective  bandwidth 
in  these  two  tasks  appears  to  differ  so  greatly.  We  are  cur¬ 
rently  in  the  process  of  investigating,  in  greater  detail,  the 
nature  of  these  two  tasks. 

The  results  of  experiment  3,  like  the  data  of  Green  and 
Kidd  ( 1983),  suggest  that  although  spectral  analysis  can  be 
achieved  using  information  across  ears,  performance  is  infe¬ 
rior  to  that  obtained  with  diotic  stimuli.  However,  our  di¬ 
chotic  thresholds  were  somewhat  smaller  than  those  ob¬ 
tained  by  Green  and  Kidd.  The  present  data  do  not  support 
the  notion  that  listeners  were  utilizing  any  binaural  cues  In 
future  publications  we  will  report  how  binaural  cues  may 
affect  the  discrimination  of  spectral  shape  when  a  variety  of 
dichotic  configurations  is  employed. 
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kHz  should  h*vc  been  employed.  Because  the  cutoff  was  10  kHz,  one  or 
two  “image"  components  in  the  region  ot  9  kHz  were  present  in  the  pass- 
band  of  the  filter.  We  reran  several  of  the  conditions  employing  a  low-pass 
cutoff  of  6  kHz.  The  thresholds  we  obtained  were  quite  close  10  and  did  not 
differ  in  ar.y  systematic  fashion  from  those  presen»ed  in  Fig.  5. 
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In  most  of  the  previous  studies  (see  Green,  1987)  concerning  the  detection  of  a  change  in 
spectral  shape,  or  “profile  analysis,”  the  listener’s  task  was  to  detect  an  increment  to  a  single 
component  of  an  otherwise  equal-amplitude,  multicomponent  background.  An  important 
theoretical  issue  is  whether  listeners’  sensitivity  to  more  complex  spectral  changes  can  be 
predicted  from  these  results.  In  the  present  investigation,  the  sensitivity  of  a  single  group  of 
listeners  to  a  wide  variety  of  simple  and  complex  spectral  changes  was  determined.  After 
collecting  the  data,  it  was  noted  that  almost  all  the  thresholds  could  be  predicted  by  a  simple 
calculation  scheme  that  assumed  detection  of  a  change  in  spectral  shape  occurs  when  the 
addition  of  the  signal  to  the  flat,  multicomponent  background  produces  a  sufficient  difference 
in  level  between  only  two  regions  of  the  spectrum.  Unfortunately,  this  scheme,  while  successful 
for  our  limited  set  of  data,  fails  to  account  for  other  “profile"  data,  namely,  those  obtained 
when  the  number  of  components  is  altered. 

FACS  numbers:  43.66.Fe,  43.66.Ba,  43.66.Jh 


INTRODUCTION 

A  number  of  previous  publications  by  Green  and  his 
colleagues  (sec  Green,  1987,  for  a  review)  have  described 
listeners’  ability  to  detect  changes  in  spectral  shape,  a  pro¬ 
cess  termed  “profile  analysis."  In  most  of  those  studies,  the 
standard  or  background  stimulus  consisted  of  a  number  of 
equal-amplitude  components  spaced  at  equal  logarithmic  in¬ 
tervals  in  frequency.  The  signal,  when  added  to  the  standard, 
produced  a  change  in  this  spectrum,  an  increment  to  a  single 
component  of  the  multicomponent  background.  Thus  a 
common  spectral  change  was  simply  a  “bump”  in  the  other¬ 
wise  fiat  spectrum. 

An  important  theoretical  issue  is  whether  listeners’  sen¬ 
sitivity  to  more  complex  spectral  changes  can  be  predicted 
from  their  sensitivity  to  increments  to  a  single  component  of 
a  2 1 -component  background.  While  some  limited  attempts 
to  address  this  question  have  been  made  in  the  past  (Green, 
1986),  we  wished  to  address  it  more  thoroughly  by  deter¬ 
mining  the  sensitivity  of  a  single  group  of  listeners  to  a  wide 
variety  of  simple  and  complex  spectral  changes.  By  doing  so, 
we  hoped  to  describe  a  model  that  would  account  for  all  the 
data.  Although  a  simple  calculation  scheme  predicts  the 
present  data  rather  well,  it  clearly  fails  as  a  general  mode!  of 
profile  analysis. 

I.  GENERAL  PROCEDURE 

In  all  the  experiments  described  below,  the  stimuli  were 
21 -component  complexes  with  equal  logarithmic  frequency 
spacing  between  adjacent  components.  The  lowest  frequen¬ 
cy  was  200  Hz;  the  highest  was  5000  Hz. 

All  stimuli  were  generated  and  presented  via  a  PDP 
11/73,  which  also  controlled  the  experimental  timing  and 
the  collection  of  responses.  The  stimuli  were  played  through 
16-bit  D/A‘s  at  a  sampling  rate  of  25  kHz  and  were  low-pass 
filtered  it  IQ  klf*,Th#  duration  ofaaahstlntulua  waa  100  tit* 
with  10-mi  cos1  risa/decay  ramps,  Tht  stimuli  war#  present¬ 


ed  diotically  over  TDH-50  earphones  to  three  listeners  with 
normal  hearing,  who  were  seated  in  separate  sound-treated 
rooms. 

A  two-alternative,  forced-choice  procedure  was  used. 

Each  trial  consisted  of  two  100-ms  observation  intervals  sep¬ 
arated  by  500  ms.  Intervals  were  marked  by  a  visual  display 
at  the  listener’s  response  box.  Feedback  was  provided  for  200 
ms  after  the  listener  responded. 

During  one  observation  interval,  the  multicomponent 
background  was  presented.  All  components  of  this  standard 
were  equal  in  amplitude.  The  other  interval  contained  the 
standard  plus  the  signal.  The  signal  altered  the  amplitude  of 
one  or  more  components  of  the  standard  and  occurred  with 
equal  a  priori  probability  in  the  first  or  second  interval. 

The  level  of  the  signal  was  varied  adaptively  in  order  to 
estimate  the  level  that  would  produce  79.4%  correct  (Levitt, 
1971).  The  level  was  decreased  by  4  dB  follosving  three  cor¬ 
rect  responses  and  increased  by  4  dB  following  one  incorrect 
response.  After  four  “reversals,"  this  step  size  was  reduced 
to  2  dB.  Trials  were  run  in  blocks  of  50  and  each  run  pro¬ 
duced  approximately  10  reversals.  Threshold  was  defined  as 
the  mean  of  the  signal  level  across  the  last  even  number  of 
reversals,  excluding  the  first  four.  Twenty-four  such  esti¬ 
mates  were  obtained  for  each  listener  and  condition.  The 
mean  of  these  estimates,  averaged  across  listeners,  is  the  de¬ 
pendent  variable  in  all  these  experiments. 

The  overall  level  of  the  stimuli  was  varied  over  a  20-dB 
range  in  1-dB  steps.  A  value  was  chosen  randomly  on  each 
and  every  presentation  in  order  to  preclude  the  listeners’ 
basing  their  judgments  on  absolute  level  rather  than  on  spec¬ 
tra!  shape.  The  median  level  was  50  dB  SPL  per  component. 
Sensitivity  to  a  change  in  the  spectrum  is  reported  as  the 

ratio  in  dB  of  the  level  of  the  signal  to  the  level  of  the  corre¬ 
sponding  component  or  components  in  the  background,  For 
example,  If  the  amplitude  of  the  signal  component  were 
equal  to  Hint  of  the  corresponding  component  in  the  back-  , 
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ground,  then  we  say  the  signal-tobackground  ratio  is  0  dD. 
If  the  signal  changes  the  amplitude  of  more  than  one  compo¬ 
nent,  we  report  the  root-mean-square  (rms)  amplitude  of 
the  signal  re;  the  standard  amplitude. 

II.  EXPERIMENT  I:  SINGLE-INCREMENT  THRESHOLDS 

As  noted  in  Sec.  I,  we  wished  to  determine  whether  lis¬ 
teners’  sensitivity  to  complex  changes  in  spectral  shape 
could  be  predicted  from  their  sensitivity  to  increments  of  a 
single  component  in  a  "flat,”  multicomponent  background. 
The  first  problem  one  encounters  is  that  previous  data 
(Green  cl  a!.,  1987)  indicate  that  the  detectability  of  an  in¬ 
crement  lo  a  single  component  in  the  spectrum  varies  greatly 
as  a  function  of  the  frequency  of  the  component.  Why  this  is 
so  is  an  interesting  issue  in  itself.  The  result  has  been  known 
for  some  lime  (Green  and  Mason,  1985),  but,  as  yet,  we 
know  of  no  satisfactory  theoretical  explanation  for  its  exis¬ 
tence. 

The  data  obtained  in  this  first  experiment  served  as  a 
basis  for  predicting  listeners’  sensitivity  to  complex  changes 
in  spectral  shape.  In  addition,  obtaining  these  data  allowed 
us  to  determine  whether  the  performance  for  this  particular 
group  of  listeners  was  typical  of  that  observed  for  the  many 
listeners  who  have  been  tested  previously. 

The  standard  was  composed  of  21  equal-amplitude 
components  ranging  from  200-5000  Hz  spaced  equally  dis¬ 
tant  on  a  logarithmic  scale  of  frequency.  The  signal  consisted 
of  an  in-phas:  addition  lo  a  single  component  of  the  stan¬ 
dard.  Five  different  frequencies  were  selected  for  the  signal: 
254,  525,  1000,  1903,  and  4256  Hz.  The  frequency  of  the 
signal  was  fixed  during  a  block  of  trials. 

A.  Results  and  discussion 

The  data  are  presented  in  Fig.  I.  The  frequency  of  the 
signal  is  plotted  logarithmically  along  the  abscissa;  signal 
threshold  in  dB  is  displayed  along  the  ordinate.  Each  point 
represents  the  mean  of  the  thresholds  obtained  from  the 
three  listeners.  1  he  error  bars  represent  the  standard  error  of 
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FIG.  1.  Threshold  for  detection  of»n  increment  to  a  iin;l»  component  of  a 
multicomponent  background  as  a  function  of  the  frequency  of  the  compo¬ 
nent.  Circles  represent  thresholds  averaged  across  listeners.  Error  bars  rep¬ 
resent  Ihe  standard  error  of  the  mean  computed  across  the  data  from  all 
listeners.  The  dotted  line  represents  data  obtained  by  Oreen  »r  at.  (  1987). 
Note  that  a  three-down,  one-up  procedure  (79.4%  correct)  was  employed 
in  the  present  study,  whereas  a  two-down,  one-up  procedure  (70.7%  cor¬ 
rect)  was  employed  in  the  previous  study. 


the  mean  computed  across  the  data  from  all  listeners.  The 
dotted  line  represents  the  thresholds  estimated  by  Green  et 
at.  (  1987)  in  an  extensive  study  of  the  elTeets  of  the  frequen¬ 
cy  of  the  signal. 

Our  average  data  indicate  that  Die  listeners  were  most 
sensitive  to  increments  in  the  middle  region  of  the  spectrum. 
Thresholds  differed  by  less  than  2.5  dB  for  the  525-,  1000-, 
and  1903-Hz  signals.  The  greatest  sensitivity  was  observed 
with  the  1000-Hz  signal,  which  yielded  a  threshold  of 
—  15.75  dB.  The  234-  and  4256-Hz  signals  yielded  some¬ 
what  poorer  thresholds  of  —  7.58  and  —  I0.S1  dB,  respec¬ 
tively. 

These  data  are  entirely  consistent  with  those  obtained 
previously  (Green  and  Mason,  1985;  Green  et  at.,  1987). 
Note  that  Green  et  al.  used  a  two-down,  one-up  adaptive 
procedure  that  estimates  the  level  of  the  signal  that  would 
yield  70.7%  correct.  In  the  present  study,  we  employed  a 
three-down,  one-up  procedure  that  estimates  the  level  for 
79.4%  correct.  Thus  our  listeners  arc,  on  average,  somewhat 
more  sensitive  than  those  who  participated  in  the  previous 
study.  Because  ci '  is  approximately  proportional  to  the  ener¬ 
gy  of  th e  signal  (Green  cl  a I.,  1987 ),  the  difference  is  equiva¬ 
lent  to  a  change  in  signal  level  of  about  3.6  dB. 

The  thresholds  reported  above  are  for  the  simplest  spec¬ 
tral  manipulation,  the  addition  of  an  increment  to  a  single 
component  of  an  otherwise  flat,  multicomponent  back¬ 
ground  and  have  established  that  our  listeners’  data  are  typi¬ 
cal  of  those  obtained  previously.  We  now  turn  our  attention 
to  a  scries  of  experiments  in  which  we  measured  listeners’ 
sensitivity  to  more  complex  spectral  manipulations. 

III.  EXPERIMENT  2A:  STEP  SPECTRA 

In  this  experiment,  the  standard  was  composed  of  the 
equal-amplitude,  21 -component  background.  The  signal 
caused  a  change  in  the  amplitude  distribution  of  the  compo¬ 
nents  over  the  entire  frequency  range.  Two  types  of  change 
were  studied.  In  the  “step-up"  condition,  the  amplitudes  of 
all  components  above  some  critical  frequency  were  in¬ 
creased  by  the  same  amount  while,  below  that  frequency,  the 
amplitudes  were  decreased  by  the  same  amount.  At  the  criti¬ 
cal  frequency,  what  we  call  the  "step  frequency,"  the  ampli¬ 
tude  was  left  unaltered.  In  the  “step-down"  condition,  Ihe 
same  procedure  was  employed  but  the  frequency  scale  was 
reversed.  Five  frequencies,  the  same  as  those  employed  in 
experiment  1,  were  chosen  as  the  step  frequencies. 

A.  Results  and  discussion 

Figure  2  displays  the  average  thresholds  for  the  step-up 
(triangles)  and  step-down  (inverted  triangles)  signals  as  a 
function  of  the  step  frequency.  It  is  important  to  note  that, 
for  these  signals,  as  well  as  for  those  described  below,  we 
have  plotted  the  rms  level  of  the  signal  rc:  the  amplitude  of  a 
single  component  in  the  background.  The  solid  line  repre¬ 
sents  our  calculated  thresholds  and  will  be  discussed  later,  as 
'  will  the  data  from  the  other  experimental  conditions  that 
appear  in  the  left-hand  poition  of  the  graph. 

The  data  of  Fig.  2  are  similar  to  those  of  Fig.  I  in  that  a 
change  in  spectral  shape  appears  to  be  most  detectable  when 
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FIG.  2.  Data  and  predictions  for  experiment  2.  Average  thresholds  for  step* 
up  (triangles)  and  step  down  (inverted  triangles)  are  plotted  as  a  function 
of  the  frequency  of  the  step.  Error  bars  represent  the  standard  error  of  the 
mein  computed  across  the  data  for  all  listeners.  The  solid  line  represents 
predictions  from  the  calculation  scheme.  Obtained  and  predicted  thresh¬ 
olds  for  tilt-up  (squares),  lilt-down  (circles),  and  alternation  (diamonds) 
are  displayed  in  the  left  portion  of  the  graph.  Open  symbols  represent  ob¬ 
tained  thresholds,  solid  symbols  represent  predictions  derived  from  the  cal¬ 
culation  scheme. 


the  step  occurs  in  the  midfrequencies  and  is  least  detectable 
at  the  extremes.  Thresholds  for  the  slcp-up  and  step-down 
conditions  were  virtually  identical  for  three  of  the  five  fre¬ 
quencies  tested.  The  greatest  sensitivity  was,  once  again,  ob¬ 
tained  at  1  kHz  and  yielded  a  threshold  of  about  —  23  dB. 
At  4256  Hz,  the  threshold  of  —  20.5  d  13  for  step-up  was 
slightly  lower  than  that  of  —  17.9  dD  obtained  with  thestep- 
down  signal.  A  considerably  larger  discrepancy  occurred  at 
234  Hz  where  the  threshold  for  step-down  was  —  15.0  dB, 
white  that  for  step-up  was  —  6.0  d  13.  We  will  discuss  these 
discrepancies  in  greater  detail  after  presenting  n  simple  cal¬ 
culation  scheme  for  predicting  these  complex  spectral 
changes. 

F.xccpt  for  the  step-up  condition  at  234  Hz,  the  thresh¬ 


olds  obtained  with  each  step  frequency  are  lower  than  those 
obtained  for  single  increments  at  the  same  frequency  (Fig. 
1).  An  important  clue  to  understanding  this  difference 
comes  from  consideration  of  the  changes  in  level  produced 
by  adding  the  signal  to  the  standard  for  each  case. 

In  the  case  of  an  increment  to  a  single  component  of  an 
otherwise  fiat,  multicomponent  background,  if  the  level  of 
the  signal  re:  the  component  to  which  it  is  added  is  —  1 5  dD, 
a  relative  increase  of  1.42  dB  is  produced  at  the  frequency  of 
the  signal.  In  the  case  of  a  step  signal,  recall  that  the  compo¬ 
nents  below  the  step  frequency  are  decreased  (increased), 
while  those  above  the  step  frequency  are  increased  (de¬ 
creased).  In  that  case,  a  —  !5-dB  signal  will  cause  the  com¬ 
ponents  that  are  incremented  to  be  raised  1.42  dB  above  the 
nominal  background  level  and  those  that  are  decremented  to 
be  lowered  by  1.7  dB.  The  total  difference  in  levei  then  would 
be  3.12  dB,  a  larger  difference  than  that  obtained  for  the 
jingle  increment.  The  reader  should  note  that  the  random 
variation  of  the  level  of  presentation  within  and  across  trials 
t  encourages  the  listener  to  compare  the  relative  levels  in  spec- 
I  Irt)  regions  above  and  below  l he  step  frequency. 


B.  The  two-channel,  level-difference  calculator 

In  attempting  to  explain  quantitatively  the  step-up/ 
step-down  data,  we  considered  a  simple  calculation  scheme 
that  assumes  detection  of  a  change  in  spectral  shape  occurs 
when  the  addition  of  the  signal  to  the  flat  multicomponent 
background  produces  a  reliable  and  sufficient  difference  in 
level  between  only  two  regions  of  the  spectrum.  The  single¬ 
increment  data  of  Fig.  1  were  used  as  the  basis  for  our  calcu¬ 
lations. 

Assume  a  level,  xs  (/  =  1,2 . 21 ),  is  measured  in  each 

of  21  frequency  channels  (corresponding  to  the  frequencies 
of  the  discrete  components  of  our  stimuli).  Each  of  these 
measures  is  assumed  to  be  contaminated  by  independent 
Gaussian  noise  with  mean  zero  and  variance  of.  Thus  each 
xr(  is  assumed  to  have  a  normal  distribution  with  mean  m( 
and  variance  of.  In  a  given  experiment,  two  channels,  /  and  j, 
are  selected  and  the  decision  is  based  on  the  difference  in 
level  (x,  —  Xj )  between  only  those  two  channels  on  each  and 
every  presentation. 

When  the  equal-amplitude  standard  is  presented, 
m  i  =  m  for  all  channels  and,  therefore,  rn,  —  rn,  =  0  for  all 
i,j.  In  this  case,  (x,  —  Xj  )  is  drawn  from  a  normal  distribu¬ 
tion  with  mean  zero  and  variance  (of  +  of ). 

When  the  standard  plus  the  signal  is  presented, 
A  =  (m,  —  mj )  and  ( x ,  —  xt )  is  drawn  from  a  normal  dis¬ 
tribution  with  mean  A  and  variance  (of  -f  of).  The  detect¬ 
ability  of  the  signal  can  be  expressed  as: 


To  calculate  the  d'  for  a  given  experimental  condition, 
we  assume  it  is  maximized  over  the  two  combinatorial  21  or 
210  possible  pairs  (i  and  j)  of  channels.  For  many  condi¬ 
tions,  this  choice  is  simple.  For  example,  when  the  signal 
consists  of  an  increment  to  a  single,  say  the  k  th,  component, 
then  m,  =  m  for  all  i  fk.  In  this  case,  A  =  0  except  for 
mk  —  irij.  Because  the  value  of  mk  —  mt  is  constant  regard¬ 
less  of  the  channel  to  which  /corresponds,  d '  is  maximized  by 
choosing/ so  that  (erf  +  oj )° 5  is  minimized. 

In  order  to  apply  the  scheme  described  above,  it  was 
necessary  to  estimate  a  single  parameter — the  variance  asso¬ 
ciated  with  the  1-kHz  channel.  Because  the  1-kHz  signal 
(signal  II)  yielded  the  lowest  threshold,  the  channel  con¬ 
taining  this  frequency  must  be  assumed  to  have  the  smallest 
variance.  We  denote  its  standard  deviation  as  cry .  Therefore, 
consistent  with  the  strategy  for  maximizing  d '  described 
above,  when  a  single  increment  occurs  at  any  of  the  20  fre¬ 
quencies  other  than  1  kHz,  the  level  in  its  channel  is  com¬ 
pared  to  that  at  1  kHz.  Once  the  value  for  cr,  (corresponding 
to  the  1000-Hz  channel)  is  chosen,  then  alt  the  eq's  corre¬ 
sponding  to  each  of  the  other  frequencies  are  determined, 
because,  for  each  other  frequency,  d  'tJ  and  A  are  known. 

The  value  of  crj  was  calculated  in  an  iterative  fashion  by 
minimizing  the  rins  error  of  our  calculated  thresholds  for  all 
2 1  signal  conditions  presented  in  these  experiments.  The  val¬ 
ue  of  d at  threshold,  is  1.16,  which  corresponds  to  79.4% 
correct  level  of  performance  estimated  by  our  adaptive  task. 
The  value  of  aJt  determined  by  the  iteration,  was  0.854  dB. 
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The  exact  value  of  this  constant  docs  not  have  a  very  large 
effect  on  the  predicted  thresholds  because  is  the  smallest 
standard  deviation  among  all  2 1  channels.  Thus,  in  calculat¬ 
ing  a  d'  value,  the  other  standard  deviation,  at,  tends  to 
dominate  [see  Eq.  (1)1- 

Using  the  07's  so  derived,  linear  interpolation  was  used 
to  estimate  cr,'s  for  frequencies  other  than  the  five  for  which 
data  were  obtained  and  thresholds  for  the  step  stimuli  were 
calculated.  The  results  of  these  calculations  are  represented 
by  the  solid  line  in  Fig.  2. 

As  Fig.  2  shows,  the  results  of  the  calculations  fit  the 
data  quite  closely  except  for  the  thresholds  obtained  with  the 
step-up  signal  at  4256  and  234  Hz.  Our  analysis  predicts 
that,  for  all  frequencies,  step-up  and  step-down  thresholds 
should  be  identical.  As  mentioned  earlier,  this  was  clearly 
not  the  case  for  step  frequencies  of  4256  and  234  Hz.  While 
the  relatively  small  discrepancy  at  4256  Hz  may  be  due  to 
random  variation  in  the  data,  the  large  disparity  between  the 
step-up  and  step-down  thresholds  at  234  Hz  cannot  be  dis¬ 
missed  so  easily.  This  trend,  which  occurred  for  all  three 
listeners,  also  occurred  for  another  group  of  listeners  who 
w  ere  run  previously  in  a  pilot  experiment.  We  know  of  no 
logical  explanation  for  its  existence. 

IV.  EXPERIMENT  2D:  TILTED  SPECTRA 

Once  again,  the  standard  or  background  consisted  of  the 
equal-amplitude,  21-componcnt  profile.  Two  different  sig¬ 
nals  were  used;  each  contained  the  same  21  components  as 
the  standard.  The  addition  of  the  signal  to  the  standard  pro¬ 
duced  a  spectrum  that  either  tilted  up  or  down  about  the 
central,  l-kllz,  component.  In  the  first  case,  the  addition  of 
the  signal  to  the  standard  caused  the  amplitude  of  each 
successive  component  to  increase  linearly  with  component 
number.  In  the  second  case,  the  amplitude  of  each  successive 
component  decreased  with  component  number.  Dy  varying 
the  relative  level  of  the  signal  with  respect  to  the  standard, 
we  were  able  to  vary  the  magnitude  or  slope  of  the  spectral 
tilt.  We  should  note  that  the  “tilt"  was  linear  on  a  pressure  or 
amplitude  scale.  That  is,  it  was  not  linear  in  dI3.  For  the  small 
amplitudes  of  the  signal  required  for  detection  by  our  listen¬ 
ers,  however,  the  tilt  was,  in  fact,  essentially  linear  in  dD  as 
w  ell  as  amplitude.  The  listener’s  task  was  to  discriminate  the 
fiat  from  the  tilted  spectra.  Separate  thresholds  were  esti¬ 
mated  for  both  the  positive  and  negative  tilts. 

A.  Results  and  discussion 

The  results  for  the  tilt-up  and  tilt-down  stimuli  arc  dis¬ 
played  in  Fig.  2  as  the  open  squares  and  circles,  respectively. 
There  is  little  difference  in  the  thresholds  for  the  two  condi¬ 
tions,  with  tilt-down  yielding  a  slightly  lower  threshold  of 
—  IS  .8  dD  as  compared  to  —  17.5  dD  obtained  for  tilt-up. 
Our  calculated  thresholds  for  the  tilt  stimuli  are  shown  by 
the  open  and  closed  circles.  As  Fig.  2  indicates,  our  calculat¬ 
ed  values  of  —  17.9  and  —  18.9  dD  for  the  tilt-up  and  tilt- 
down  conditions,  respectively,  mirror  this  trend  and  are 
within  0  4  dD  of  the  thresholds  obtained. 

These  tilted  spectra  provide  perhaps  the  most  illustra¬ 
tive  demonstration  of  the  operation  of  the  calculation 
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scheme.  If  one  were  to  ignore  the  notion  of  different  magni¬ 
tudes  of  variance  affecling  the  measurement  of  level  on  each 
channel  then,  clearly,  the  two  channels  that  would  yield  the 
greatest  difference  in  level  would  be  those  at  the  extremes  of 
the  spectrum — 200  and  5000  Hz.  As  shown  by  Eq.  ( 1 ),  how¬ 
ever,  the  maximal  d  '  involves  a  trade-off  between  difference 
in  level  and  the  inhere  nt  variability  of  its  measurement.  Ac¬ 
cording  to  the  scheme,  d'  is  maximized  for  the  tilt-up  spec¬ 
trum  when  the  difference  in  level  between  the  6th  (447  Hz) 
and  21st  (5000  Hz)  components  is  employed.  For  tilt-down, 
d '  is  maximized  when  the  7  th  (525  Hz)  and  21st  channels  are 
used.  The  thresholds  reported  above,  which  are  within  0.4 
dD  of  those  actually  obtained,  were  based  on  the  use  of  these 
differences  in  level. 

V.  EXPERIMENT  2C:  ALTERNATING  SPECTRUM 

In  this  experiment,  we  employed  a  signal  essentially 
identical  to  that  used  by  Green  and  Kidd  ( 1983).  The  signal 
was  such  that,  when  it  was  added  to  the  background,  it 
caused  successive  components  to  be  alternately  incremented 
and  decremented. 

A.  Results  and  discussion 

The  average  threshold  of  —  21.7  dD  obtained  for  this 
condition  is  plotted  as  the  open  diamond  in  Fig.  2.  This  value 
is  about  6  dD  lower  than  that  obtained  for  the  single  incre¬ 
ment  (experiment  1 )  at  1  kHz.  Green  and  Kidd  (  1983)  also 
measured  a  6-dD  difference  for  these  two  conditions. 

Our  calculated  threshold  of  —  2 1.8  dD  for  the  alternat¬ 
ing  spectrum  is  plotted  as  the  solid  diamond  in  Fig.  2.  Once 
again,  our  calculations  were  based  on  the  assumption  that 
detection  of  a  change  in  spectral  shape  occurs  when  the  addi¬ 
tion  of  the  signal  to  the  fiat,  multicomponent  background 
produces  a  reliable  and  sufficiently  large  difference  in  level 
between  only  two  regions  of  the  spectrum.  For  the  alternat¬ 
ing  spectrum,  the  difference  in  level  between  any  two  adja¬ 
cent  components  is  identical.  Recall,  however,  that  these  dif¬ 
ferences  are  most  detectable  in  the  1-kHz  region.  Therefore, 
the  difference  in  level  between  only  two  components  in  this 
frequency  region  is  used  in  calculating  the  threshold. 

Once  more  the  calculation  scheme  provides  an  easy  way 
to  understand  why  this  result  occurs.  The  level  of  the  signal 
required  for  detection  of  a  change  in  spectral  shape  is  6  dB 
less  in  the  case  of  the  alternating  spectrum  than  for  a  single 
increment.  Again,  the  essence  of  the  explanation  is  that  the 
addition  of  the  signal  to  the  background  produces  simulta¬ 
neously  both  increments  and  decrements  to  the  fiat  back¬ 
ground.  Hence,  a  given  level  of  the  signal  produces  a  greater 
dD  difference  between  two  regions  of  the  spectrum  and  a 
larger  d  '  than  docs  a  single  increment.  What  is  counterintui¬ 
tive  about  this  account  is  that  the  use  of  only  a  single  differ¬ 
ence  in  level  describes  the  data  despite  the  opportunity  for 
many  such  comparisons — ten  independent  pairs  of  compari¬ 
sons  for  the  alternating  spectrum. 

VI.EXPERIMENT3.SINUSOIDALLY  RIPPLED  SPECTRA 

In  this  experiment,  the  signal  produced  a  sinusoidal 
change  in  the  amplitudes  of  the  flat  profile.  That  is,  the  addi- 
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lion  of  the  signal  produced  what  we  refer  to  as  a  "sinusoidal¬ 
ly  rippled"  spectrum. 

A.  Procedure 

The  standerd  waveform  was  the  21-component  flat 
spectrum  that  ranged  in  frequency  from  200-5000  Hz  with 
successive  components  spaced  equally  on  a  logarithmic 
scale.  The  addition  of  the  signal  produced  a  power  spectrum 
whose  magnitude  varied  sinusoidally  as  a  function  of  the 
logarithm  of  frequency.  We  measured  thresholds  for  ripples 
of  1,  5,  and  10  cycles. 

Specifically,  the  "signal"  waveform  was  produced  by 
setting  the  amplitude  of  successive  components,  a(i),  ac¬ 
cording  to  the  following  equation: 

a(i )  =  sin[2rr^(//Af))  i—  1,2 . M, 

where  i  is  the  number  rf  the  component,  ranging  in  this  case 
from  1  to  21,  a(i)  is  the  amplitude  of  the  ith  component  of 
the  signal  spectrum,  and  k  is  the  “frequency"  of  the  ripple. 
Recall  that  the  first  component,  /'  =  1,  corresponds  to  a  fre¬ 
quency  of  200  Hz,  and  the  last  component,  /  =  21,  corre¬ 
sponds  to  a  frequency  of  5000  Hz. 

The  “depth"  of  the  ripple  resulting  from  the  addition  of 
the  signal  to  the  standard  waveform  depends  upon  the  ratio 
of  the  amplitudes  of  the  signal  components  to  those  of  the 
standard's  equal-amplitude  components.  The  depth  of  the 
ripple  is,  of  course,  monolonically  related  to  the  signal-!o- 
stardard  ratio.  We  scaled  the  amplitude  of  this  “signal"  and 
added  each  component  in-phase  (respecting  sign)  to  the  cor¬ 
responding  component  of  the  fiat  standard  spectrum  to  pro¬ 
duce  the  change  in  the  spectrum. 

It  should  be  noted  that,  by  constructing  the  signal  in  the  . 
manner  described  above,  the  rms  of  the  amplitudes  across 
components  is  independent  of  the  frequency  of  the  rippl",  k, 
because  the  2 1  values  for  any  set  of  a(i)  are  the  same;  only 
their  order  within  the  set  has  been  changed.  If  the  maximum 
value  for  a(i)  is  one,  the  rms  value  is  0.707.  We  refer  to  the 
signal-to  standard  ratio  as  the  rms  signal  amplitude  to  the 
amplitude  of  any  component  of  the  standard. 


D.  Results  and  discussion 

Th*  results  arc  shown  in  Table  I.  I  he  data  indicate  that 
thresholds  are  fairly  constant  at  about  —  23  dB  for  the  three 
ripple  frequencies  tested.  The  5-cyclc  ripple  appears  to  have 
yielded  slightly  better  performance  than  the  two  other  val- 
>  ucs.  These  data  arc  consistent  with  those  obtained  earlier  by 
j  Green  el  at.  (1987)  and  by  Bernstein  and  Green  (1987). 

[  The  calculated  thresholds  arc  quite  close  to  the  obtained 
?  values.  The  largest  discrepancy  occurred  at  ten  ripples 
I  *here  the  obtained  threshold  was  —21  8  dB,  while  the  pre¬ 
dicted  threshold  was  --  24.7  dD.  Our  calculations  exhibit  a 
V  slight  monotonic  decrease  in  threshold  as  the  number  of  rip¬ 
ples  is  increased  from  one  to  ten.  This  is  due  to  the  fact  that, 
*J  the  number  of  ripples  increases,  the  spectral  "peaks"  and 
"valleys"  increase  in  number  and  become  more  closely 
spaced.  Thus  the  two  frequency  channels  whose  levels  arc 
\  »  compared  to  produce  the  maximal  ci '  fall  in  an  increasingly 
;  narrow  region  around  1  kHz  where  the  variances  (of)  arc 
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TADLE  I.  Obtained  and  predicted  thresholds  (dD)  for  the  sinusoidally  rip¬ 
pled  spectra. 


Number  of  ripples 

Threshold 

1 

5 

10 

Obtained 

-  22.4 

-  25.0 

-  21.8 

Predicted 

-  22.2 

-  22.8 

-  24.7 

smallest.  This  leads  to  increased  sensitivity.  Interestingly,  in 
the  two  previous  studies  mentioned  above,  this  predicted 
trend  was  obtained.  However,  in  the  present  study,  the  low¬ 
est  obtained  threshold  occurs  at  five  ripples. 

VII.  GENERAL  DISCUSSION 

It  is  important  to  note  at  the  outset  that  two  restrictions 
were  present  in  all  the  experimental  conditions  we  have  re¬ 
ported  in  this  article.  First,  the  standard  spectrum  from 
which  changes  in  spectral  shape  were  detected  was  always 
flat.  Second,  and  probably  more  important,  the  standard  was 
defined  by  a  fixed  number  of  components  (21 ).  For  this  set 
of  restricted  conditions,  all  the  thresholds  can  be  predicted 
With  good  accuracy  by  calculating  the  difference  in  level  be¬ 
tween  only  two  frequency  channels.  The  rms  difference  be¬ 
tween  the  calculated  and  obtained  thresholds  is  only  about 
2.2  dB;  that  is,  the  calculations  account  for  slightly  greater 
than  80%  of  the  variance  across  all  thresholds  obtained.  If 
the  prediction  for  the  step-up  signal  at  234  Hz  is  excluded 
from  the  analysis,  then  the  rms  difference  drops  to  about  1.2 
dB. 

The  problem  v,  ith  this  approach  is  its  lack  of  generality. 
For  example,  it  does  not  predict  the  decrease  in  threshold 
that  is  observed  for  single  increments  as  the  number  of  com¬ 
ponents  in  the  profile  is  increased  from  3  to  about  2 1  compo¬ 
nents  (Green  eta!.,  1984;  Green  and  Mason,  1985;  Bernstein 
and  Green,  1987).  The  systematic  addition  of  components 
from  3  to  21  improves  detection  performance  by  approxi¬ 
mately  13  dB!  The  standard  explanation  for  this  phenome¬ 
non  is  that,  because  of  integration  of  information  across 
channels,  the  greater  number  of  components  leads  to  a  better 
estimate  of  the  level  of  the  fiat  spectrum.  Such  an  argument 
may  be  correct,  but  is  completely  inconsistent  with  the  cal¬ 
culation  scheme  we  have  used  in  this  article.  Consider  a  sig¬ 
nal  consisting  of  an  increment  to  the  central,  1-kHz,  compo¬ 
nent  ofthe  21-component  background.  The  scheme  assumes 
that  such  a  signal  is  detected  on  the  basis  of  the  difference  in 
level  between  the  channel  containing  the  1-kHz  component 
and  a  single  adjacent  channel.  Removing  all  but  these  two 
components  should,  in  principle,  leave  detection  perfor¬ 
mance  unaffected.  The  data  from  previous  studies,  particu¬ 
larly  those  of  Green  et  al.  ( 1984),  demonstrate  that  this  is 
clearly  not  the  case. 

The  comparison  of  these  two  sets  of  results  leads  to  a 
paradox.  On  the  one  hand,  if  the  signal  is  a  single  increment 
at  1  kHz,  then  the  results  indicate  that  components  far  re¬ 
moved  from  the  frequency  of  the  signal  enhance  the  detect¬ 
ability  of  this  increment.  On  the  other  hand,  when  the  signal 
produces  changes  in  the  standard  that  are  widely  distributed 
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nci  css  the  spectrum,  such  ns  in  the  ease  of  the  alternating  or 
ripple  i  spectra,  only  two  components  of  the  signal  appear  to 
contribute  to  its  detection.  In  short,  it  appears  that  the  entire 
spectrum  contributes  to  an  estimate  of  the  fiat,  stnndard 
speed  a,  but  only  two  channels  contribute  to  the  detection  of 
the  signal. 

Finally,  we  must  compare  this  calculation  scheme  with 
ether  models  of  profile  analysis.  There  arc,  unfortunately, 
few  alternatives.  Durlach  era/.  (1986),  in  a  recent  article, 
suggest  an  optimum  model  to  combine  information  across 
different  frequency  channels.  This  model,  by  introducing 
special  assumptions,  could  be  reduced  to  one  that  considers 
only  the  difference  in  level  between  two  channels.  However, 
their  general  model  is  considerably  more  complicated  than 
(tie  simple  calculation  scheme  presented  here. 

For  the  restricted  conditions  of  the  present  experiment, 
our  simple  calculation  scheme  provides  better  predictions 
than  more  complicated,  and  generally  more  efficient,  detec¬ 
tion  procedures.  As  an  example,  compare  the  case  of  the 
single  increment  (experiment  1 )  to  the  alternating  or  rippled 
spectrum  (experiment  2C).  According  to  our  calculations, 
no  advantage  is  gained  from  this  increased  number  of 
changes;  only  the  level  of  a  single  pair  of  components  is  com¬ 
pared.  Green  (1986)  has  explored  a  more  efficient  way  of 
combining  information  (vector  summation  ofd')  over  dif¬ 
ferent  frequency  channels  with  little  success.  Using  an  opti¬ 
mum  combination  of  21  statistically  independent  channels, 
bis  predictions  were  about  7  dB  less  than  the  obtained 
thresholds. 

Interestingly,  our  scheme  is  similar,  in  several  respects, 
to  Zwickcr’s  “excitation  pattern”  model  for  intensity  dis¬ 
crimination  (for  on  overview,  sec  Zwickcr,  1970).  Even  a 
stimulus  with  a  very  narrow  spectrum,  such  as  a  sinusoid, 
produces  excitation  in  a  number  of  critical  bands.  Changes 
in  the  intensity  produce  changes  in  their  corresponding  "ex¬ 
citation  patterns.”  According  to  Zwickcr's  model,  the  listen¬ 
er  detects  a  change  in  intensity  when  the  greatest  change  in 
excitation  level  in  a  single  critical  band  exceeds  some  thresh¬ 
old  value.  Zwickcr's  cxcitation-paltcrn  model  was  con¬ 
structed  to  account  for  the  data  obtained  when  the  listener 
was  required  to  make  successive  comparisons  of  intensity.  In 
our  case,  the  change  in  level  between  two  channels  or  bands 
is  observed  during  a  single  presentation  of  the  stimulus,  and 
must,  perforce,  be  a  simultaneous  comparison  of  intensity. 

Zwickcr's  model  assumes  that  only  the  change  in  level 


within  a  single  channel  is  relevant  despite  the  fact  that  many 
channels  may  exhibit  a  change,  a  process  not  unlike  that 
which  we  have  proposed.  Florentine  and  Duus  (1981)  have 
suggested  an  alternative  version  of  Zwicker's  model  in  which 
successive  changes  in  level  within  several  critical  bands  are 
integrated  statistically  (vector  summation  ofrf ')  and  used  as 
the  basis  for  detecting  a  change  in  the  intensity  of  the  stimu¬ 
lus.  As  noted  previously.  Green’s  application  of  a  similar 
model  to  simultaneous  discriminations  greatly  overpredicts 
the  data. 

In  summary,  we  have  presented  a  simple  calculation 
scheme  that  predicts  the  detectability  of  complex  changes  in 
spectral  shape.  Despite  its  success  for  the  limited  experimen¬ 
tal  conditions  presented  in  this  article,  it  clearly  fails  as  a 
general  model.  Understanding  the  nature  of  the  experimen¬ 
tal  restrictions  in  more  detail  may  suggest  ways  to  modify 
the  calculation  scheme. 
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INTRODUCTION 

In  several  recent  papers,  we  have  investigated  the  detection  of  a 
change  in  spectral  shape  of  a  complex  auditory  signal.  The  discrimination 
task  Involves  a  broadband  'standard'  spectrum  and  some  alteration  of  that 
spectrum  produced  by  adding  a  'signal'  to  the  standard.  For  most  of  the 
experiments,  we  have  used  a  standard  that  is  composed  of  a  set  of  equal- 
amplitude  sinusoidal  components.  The  standard  spectrum  is,  therefore, 
essentially  flat.  In  different  experiments,  different  waveforms  have  been 
added  to  this  standard  spectrum  to  create  a  change  in  spectral  sn_.pe,  and 
the  detectability  of  such  changes  has  been  measured.  A  signal  commonly 
used  In  these  experiments  was  a  single  sinusoid  added  in-phase  to  some 
component  of  the  standard.  Since  this  signal  increases  the  intensity  at 
only  one  frequency  region,  we  describe  this  situation  as  detecting  a 
'bump'  in  an  otherwise  flat  spectrum.  One  experimental  question  is  wheth¬ 
er  a  bump  at  one  frequency  region  is  easier  to  hear  than  a  bump  at  some 
different  frequency  region.  Also,  we  might  consider  more  complicated 
changes  in  the  spectra  such  as  a  signal  that  produces  changes  in  the  am¬ 
plitudes  of  several  components  of  the  standard.  How  well  are  such  altera¬ 
tions  of  the  acoustic  spectra  detected,  and  how  is  the  detectability  of 
these  general  changes  related  to  the  detectability  of  an  Increment  at  one 
f  requency? 


FREQUENCY  EFFECTS 

Before  reporting  on  the  detectability  of  mere  complicated  signals,  we 
must  begin  by  determining  how  changes  in  the  frequency  locus  of  a  single, 
sinusoidal  signal  affect  the  ability  to  detect  a  spectral  change.  The  ex¬ 
perimental  task  Is  as  follows.  The  standard  is  a  21-component  complex 
composed  of  equal-amplitude  sinusoids  spaced  equally  on  a  logarithmic  fre¬ 
quency  scale,  ranging  from  200  to  5000  Hz.  The  ratio  of  successive  fre¬ 
quencies  in  the  complex  is,  therefore,  1.175.  Such  a  "uniform"  standard 
was  selected  because  we  may  regard  the  cochlea  as  a  linear  receptor  array, 
where  distance  along  the  array  is  roughly  proportional  to  the  logarithm  of 
sound  frequency.  Our  uniform  standard  then  produces  excitation  at  roughly 
equal  spatial  intervals.  We  have  also  tested  non-uniform  standards  with 
unequal  amplitude  or  non-uniform  frequency  spacing  between  components 
(Kidd,  Mason,  and  Green,  1986),  but  it  appears  that  che  detection  of  an 
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increment  in  a  single  component  of  the  complex  is  always  more  difficult 
for  those  standards  than  when  the  same  increment  is  made  in  the  "uniform" 
standard. 


Before  presenting  our  results,  we  should  make  clear  one  other  import¬ 
ant  detail  of  the  experimental  procedure.  To  insure  that  the  observers 
are  actually  listening  to  a  change  in  the  shape  of  the  spectrum,  rather 
than  a  change  in  absolute  intensity  level  at  some  limited  frequency 
region,  we  randomly  vary  the  overall  level  of  the  sound  presented.  Each 
trial  of  the  two-alternative  forced-choice  task  contains  two  sound  presen¬ 
tations,  the  standard  and  signal-plus-standard.  The  overall  level  for 
each  presentation  is  determined  by  selecting  from  a  uniform  distribution 
of  amplitude  levels,  typically  having  a  range  of  20  dB,  in  1  dB  steps. 

The  median  of  this  distribution  in  the  present  experiment  is  60  dB  SPL, 
but  the  exact  value  matters  little  (Mason,  Kidd,  Hanna,  and  Green,  1984) . 
Thus,  the  standard  might  have  components  presented  at  a  level  of  64  dB, 
whereas  the  signal-plus-standard  might  be  presented  at  an  average  compon¬ 
ent  level  of  52  dB.  The  correct  answer  is  the  less  intense  sound.  The 
duration  of  the  presentations  was  about  100  ms,  and  the  onset  and  offset 
have  shore,  5  ms  cosine  ramps  to  diminish  audible  transients. 


Figure  1  shows  data  on  the  detectability  of  an  increment  in  a  single 
component  of  the  uniform  standard  at  several  different  signal  frequencies. 
The  abscissa  is  the  frequency  of  the  signal,  that  is,  the  frequency  of  the 
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Threshold  for  a  spectral  bump  as  a  function  of  frequency. 


component  that  is  added  in-phase  to  the  corresponding  component  of  the 
standard.  The  ordinate  is  the  threshold  for  that  signal  as  determined  in 
a  two-alternative  forced-choice  task.  The  signal  is  adapted  in  level 
using  a  two-down,  one-up  rule  and  thus  estimates  a  0.707  probability  of 
being  correct.  The  value  plotted  as  the  threshold  is  the  ratio  of  the 
signal  amplitude  to  the  amplitude  of  that  component  of  the  standard  in 
decibels.  If  the  signal  amplitude  is  one-eighth  the  amplitude  of  the  com¬ 
ponent  of  the  standard  at  threshold,  then  the  threshold  value  is  about 
-18  dB.  This  value  corresponds  to  a  Weber  fraction  of  1.03  dB.  This 
value  appears  to  be  near  the  minimum  of  the  data  shown  in  Fig.  1.  At  fre¬ 
quencies  lower  or  higher  than  about  1000  Hz,  the  signal  becomes  a  bit  more 
difficult  to  hear,  but  the  effects  of  frequency  are  not  very  great.  Only 
at  the  highest  frequency  tested,  4256  Hz,  is  the  threshold  elevated  by 
more  than  five  dB. 

We  should  also  make  clear  that  this  minimum  at  the  middle  frequency 
region  depends  both  on  the  absolute  frequency  value  and  the  relative  posi¬ 
tion  of  the  increment  within  the  complex  spectrum.  Other  experiments  have 
shown  a  minimum  in  the  function  at  lower  frequencies  for  a  complex  occupy¬ 
ing  a  frequency  range  200  to  2000  Hz.  The  minimum,  however,  is  not 


solely  dependent  on  context:  absolute  frequency  value  is  also  important. 

If  the  complex  consists  of  frequencies  ranging  from  1000  to  10,000  Hz,  the 
smallest  threshold  occurs  when  the  signal  is  presented  at  the  lowest  fre¬ 
quency  component  (Green,  Onsan,  and  Forrest,  1986). 

For  the  remainder  of  this  paper,  we  will  consider  more  complicated 
alterations  in  this  standard  spectrum.  The  standard  spectrum  will  always 
occupy  the  frequency  range  from  200  to  5000  Hz.  For  this  frequency  range, 
Fig.  1  shows  that  changes  in  such  spectra  are  approximately  equal  in  de¬ 
tectability  as  long  as  the  frequency  of  such  a  change  is  less  than 
3000  Hz. 


SINUSOIDAL  VARIATION  IN  THE  SPECTRA 

One  way  to  learn  something  about  the  mechanisms  responsible  for 
detecting  these  alterations  in  the  acoustic  spectra  is  to  use  sinusoidal 
changes  in  the  power  spectra  and  to  vary  the  frequency  at  which  the  sinu¬ 
soidal  variation  occurs.  Because  our  spectra  are  defined  at  only  a  finite 
number  of  points,  the  21  frequencies  of  the  components  of  our  standard 
spectra,  we  can  alter  the  frequency  variation  only  over  a  limited  range. 
Figure  2  shows  how  we  carried  out  this  experimental  manipulation.  The 
spectrum  displayed  at  the  left  of  the  diagram  shows  a  signal  that  produces 
a  single  cycle  of  sinusoidal  variation  over  the  amplitude  of  our  succes¬ 
sive  components.  Recall  that  the  frequencies  of  the  components  are 
equally  spaced  along  a  logarithmic  frequency  axis.  The  next  spectrum 
shows  two  cycles  of  variation.  As  the  frequency  of  variation  is 
increased,  we  finally  reach  the  spectrum  shown  on  the  right  side  of  the 
figure.  In  this  spectrum,  successive  components  alternately  increase  or 
decrease  in  amplitude,  and  no  higher  races  of  variation  can  be  achieved. 
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Fig.  2.  Three  different  frequencies,  k,  of  sinusoidal  variation. 


The  following  equation  expresses  how  the  variation  was  achieved.  Let  a [ i 1 
be  the  amplitude  of  the  ith  component  of  the  signal  spectra,  where  i 
ranges  from  the  first  component,  a  frequency  of  200  Hz,  to  the  last  compo¬ 
nent,  M.  In  this  experiment  M=2l,  and  the  frequency  of  the  last  component 
is  5000  Hz.  We  set  the  amplitude  of  the  ith  component  as  follows: 

a [ i I  -  sin  (  2  *  pi  *  k  *  i/M)  i  -  1,2. .M  Eq.  I 

where  k  represents  the  'frequency'  of  the  variation  in  the  amplitude  spec¬ 
tra  (k»l , 2. . 10) .  If  we  scale  the  amplitude  of  this  'signal'  and  add  each 
component  in-phase  to  the  corresponding  component  of  the  'standard'  com¬ 
plex,  in  which  each  component  is  equal  in  amplitude,  we  produce  the  change 
in  spectral  shape  shown  in  Fig.  2.  We  speak  of  this  variation  in  ampli¬ 
tude  as  producing  a  ripple  in  the  spectrum  and  refer  to  the  parameter,  k, 
as  the  frequency  of  the  ripple. 

Before  presenting  the  results,  we  should  make  some  general  comments 
on  this  method  of  constructing  the  rippled  spectrum  and  how  changes  in  the 
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frequency  parameter,  k,  affect  various  parameters  of  the  resulting 
spectrum.  First,  the  root-mean-square  (rms)  of  the  amplitudes  across  the 
21  components  of  the  signal  is  independent  of  k,  the  frequency  of  the 
ripple.  If  the  maximum  value  of  a[i]  is  unity,  this  rms  value  is  0.707. 
This  value  is  independent  of  k  because  for  any  frequency  of  ripple,  the  21 
values  for  the  set  a[i]  are  the  same;  only  their  order  is  changed.  This 
is  true  because  of  the  modular  nature  of  the  sine  function.  Thus,  the 
value  of  any  function  whose  domain  is  the  set  of  21  amplitude  values, 
a[i],  is  independent  of  k,  the  frequency  of  the  ripple.  Next,  we  should 
note  that  a  cosine  ripple  can  be  achieved  by  using  the  cosine  rather  than 
the  sine  function  in  Eq.  1.  Naturally,  the  cosine  ripple  has  the  same  rms 
amplitude  as  the  sine  ripple,  and  that  value  is  also  independent  of  the 
frequency  of  the  ripple,  k. 

To  construct  the  ripple  spectrum  actually  presented  to  the  listeners, 
we  first  scale  the  signal  and  add  it,  in-phase,  to  the  components  of  the 
standard  spectrum.  The  depth  of  the  resulting  ripple  depends  on  the  ratio 
of  the  amplitude  of  the  signal  components  to  the  amplitude  of  the  standard 
components.  The  components  of  the  standard  spectrum  are  all  equal  in  am¬ 
plitude.  It  is  convenient  to  use  the  rms  amplitude  of  the  21-signal  com¬ 
ponents  as  our  measure  of  signal  amplitude.  We  will  then  refer  to  the 
signal-to-standard  ratio,  meaning  the  ratio  of  the  rms  signal  amplitude  to 
the  amplitude  of  any  component  of  the  standard.  Obviously,  the  depth  of 
the  ripple  produced  in  the  resulting  spectra  is  monotonic  with  this 
signal-to-standard  ratio.  The  spectra  shown  in  Fig.  2,  for  example,  were 
constructed  with  a  signal-to-standard  ratio  of  0.1414. 

Finally,  we  should  note  that,  for  any  signal-to-standard  ratio,  the 
sum  of  the  difference  in  amplitude  between  successive  components  increases 
monotonically  with  k.  This  occurs  simply  because  the  difference  in  ampli¬ 
tudes  between  successive  components  approximates  the  derivative  of  a[i] 
(see  Eq.  1)  which  Is  proportional  to  k.  Thus,  the  larger  the  values  of  k, 
the  more  ragged  the  spectra,  and,  if  we  measure  raggedness  as  the  rms  dif¬ 
ference  between  the  amplitude  of  successive  components,  it  changes  from 
0.21  to  1.41  as  k  changes  from  1  to  10. 


RESULTS 

Table  1  presents  the  data  on  the  threshold  for  changes  in  the 
spectra,  using  either  sine  or  cosine  ripples  and  different  values  of  k. 

The  threshold  for  the  signal  is  measured  in  terms  of  the  signal-to-stand¬ 
ard  ratio  and  is  nearly  constant  and  independent  of  k,  the  frequency  of 
the  ripple.  Different  changes  in  spectral  shape  produced  by  either  the 
sine  or  cosine  version  of  Eq.  1,  for  all  frequencies  of  ripple,  are  equal¬ 
ly  detectable.  With  the  exception  of  a  single  point,  the  k=*9  cosine 
ripple,  the  thresholds  are  within  2  dB  of  the  same  value,  namely,  -24  dB, 
for  all  the  conditions  tested. 

This  is  a  most  unusual  result  in  sensory  psychophysics.  In  almost 
all  studies  of  sensory  systems,  some  change  in  the  modulation  transfer 
function  with  frequency  is  evident.  In  this  study,  the  modulation  trans¬ 
fer  function  is  essentially  flat.  Obviously,  if  the  total  number  of  com¬ 
ponents  in  the  standard  signal  were  increased,  then  one  would  reach  a 
point  where  the  high  frequency  variation  was  not  evident.  This  point 
would  occur  when  successive  components  fell  within  the  same  critical  band 
and,  hence,  could  not  be  resolved.  This  result  would  indicate  a  simple 
low-pass  filter  behavior  for  the  system  and  is  consistent  with  the  fact 
that  the  frequency  resolution  for  the  ear  is  limited. 
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Table  1.  Average  threshold  value  tor  different  frequencies  of  ripple. 
Entry  is  signal  rms  to  standard  ratio  in  dB. 


Frequency  1 

of  Sipple, 
k 

3 

2 

5 

9 

7  9 

6  8 

10 

mean 

Sine  -29.6 

-29.0 

-29.6 

-23.9  -25.0 

-29.6 

-29.5 

Cosine  -29.1 

-23.8 

-29.7 

-25.7  -29.3 

-23.0 

-23.2 

-25.7  -23.0 

-22.8 

-29.5 

Our  results  were  obtained  with  21  components.  The  ratio  of  the  fre¬ 
quencies  of  successive  components  is  1.175.  Thus,  presumably,  each  compo¬ 
nent  falls  in  a  separate  critical  band.  If  neighboring  critical  bands 
were  linked  with  an  excitatory  center  and  inhibitory  surround,  as  a  simple 
lateral  inhibition  model  might  suggest,  then  we  would  expect  to  see  better 
thresholds  at  some  frequencies  of  ripple  than  at  others.  Our  results 
imply  that  the  local  interactions  between  different  critical  bands  produce 
no  apparent  resonance  as  a  function  of  the  frequency  of  ripple.  It  is  un¬ 
likely  that  finer  frequency  spacing  between  the  components  of  the  standard 
will  reveal  such  resonance,  because  we  are  nearing  the  frequency  resolu¬ 
tion  limits  of  the  cochlear  array,  the  critical  band.  Our  spacing  is  al¬ 
ready  narrower  than  one  might  expect  for  a  lateral  inhibition  network.  If 
we  look  at  physiological  studies  of  inhibition,  then  we  find  that  the  in¬ 
hibitory  side  bands  are  at  least  one  or  two  critical  bands  away  from  the 
central  components  (Sachs  and  Klang,  1968). 

Before  leaving  these  results,  we  should  also  compare  the  detection 
of  these  ripple  signals  to  the  detection  of  a  change  in  amplitude  of  a 
single  component  of  the  standard,  a  spectral  bump.  As  the  results  indi¬ 
cate,  the  thresholds  for  the  two  signals  are  about  6  dB  apart;  the  thresh¬ 
old  for  the  single  bump  is  -18  dB,  whereas  the  threshold  for  ripple  is  -29 
dB.  This  difference  is  far  short  of  what  one  might  expect  on  the  basis  of 
several  models.  Perhaps  the  simplest  idea  is  to  assume  that  a  rippled 
change  in  a  spectrum  is  easier  to  detect  than  a  change  in  amplitude  of  a 
single  component,  because  the  rippled  change  allows  one  the  opportunity  to 
combine  the  output  of  several  independent  channels.  The  standard  way  to 
combine  such  channels,  assuming  statistical  independence,  predicts  that 
the  combined  detectability  (d'  )  will  equal  the  square  root  of  the  sum  of 
the  squares  of  the  detectabiliE ies  for  each  separate  channel  (d'  )  .  Since 
there  are  21  separate  components,  this  leads  to  the  expectation  that  the 
combined  detectability  will  be 

(21)^  *  0.707  -  3.23 

better  than  the  detectability  of  the  single  channel.  The  factor  0.707 
arises  because  it  is  the  average  amplitude  of  the  signal  in  a  rippled 
spectrum  compared  to  the  amplitude  of  the  signal  for  the  single  component 
bump.  Since  detectability  (d')  for  a  single  component  signal  is  roughly 
proportional  to  signal  voltage,  not  power,  that  factor  translates  to  a 
difference  of  about  10  dB,  some  9  dB  greater  than  the  empirically  deter¬ 
mined  result.  One  might,  of  course,  argue  that  there  are  not  21  independ¬ 
ent  channels  but  some  lesser  number,  even  though  the  spacing  between  suc¬ 
cessive  components  is  about  two  critical  bands.  To  achieve  the  difference 
of  6  dB,  however,  one  must  assume  that  only  8  Independent  channels  are 
combined,  a  value  that  seems  unbelievably  low. 
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To  more  fully  appreciate  the  problem  inherent  in  the  differences  in 
detectability  of  these  two  spectra,  let  us  consider  the  Weber  fraction  at 
each  component  frequency  for  the  two  spectral  changes,  a  spectral  bump  anc 
a  one-cycle  ripple.  Figure  3  shows  a  plot  of  the  Weber  fraction  plotted 
as  a  function  of  the  component  numbers,  from  1  to  21,  for  these  two  spec¬ 
tral  changes.  The  two  threshold  values  used  in  constructing  these  plots 
are  the  average  threshold  values  measured  for  chese  two  cases. 


COMPONENT  NUMBER 

Fig.  3.  DELTA  I  in  dB  at  each  component  for  two  spectral  changes. 


The  signal-to-standard  level  is  -18  dB  for  the  single-component  in¬ 
crement  at  the  eleventh  component.  This  is  about  the  value  seen  in  Fig.  1 
for  single,  low-frequency  increments  in  a  single  sinusoid.  The  rms 
signal-to-standard  level  is  -24.5  dB  for  the  rippled  spectrum.  This  is 
the  average  value  found  for  such  rippled  spectra;  see  Table  l.  The  Weber 
fraction  for  the  single-component  signal  is  just  over  one  dB.  The  rippled 
spectrum  produces  different  Weber  fractions  ranging  from  about  -0.7  to 
+0.7  dB.  The  rras  value  for  the  Weber  fraction  is  0.525  dB  for  the  rippled 
spectrum.  One  does  not  need  to  have  an  elaborate  theoretical  structure  to 
be  puzzled  by  the  fact  that  these  two  patterns  of  spectral  change  are 
nearly  equal  in  detectability.  The  combination  of  two  of  the  larger  Weber 
fractions  for  the  rippled  spectrum  will  easily  exceed  the  Weber  fraction 
for  the  single  increment,  and  the  rippled  spectrum  will  have  19  Weber 
fractions  remaining.  n-e  can  only  conclude  that  the  Weber  fractions  at 
the  different  component  frequencies  appear  to  contribute  little  to  the 
detectability  of  spectral  change  in  the  case  of  the  rippled  spectra.  Nor 
does  this  kind  of  discrepancy  exist  only  for  sinusoidal  changes  in  the 
spectrum. 

Green,  Kidd,  and  Picardi  (1983)  measured  the  detectability  of  sizable 
changes  in  a  21-component  spectrum.  They  compared  the  detection  of  a 
single  bump  at  950  Hz,  with  a  ’downward'  or  'upward'  step  in  which  all 
components  above  or  below  950  Hz  were  increased  or  decreased  in  amplitude. 
They  also  used  a  rippled  spectrum  in  which  alternate  components  were  in¬ 
creased  or  decreased  in  amplitude  by  the  same  amount.  The  average  differ¬ 
ence  in  thresholds  between  the  single  component  change  and  the  change  pro¬ 
duced  at  all  21  components  was  about  5  or  6  dB.  In  this  case,  the  0.707 
value  does  not  come  into  play  and  the  expected  difference  for  a  21-compon¬ 
ent  combination  is  13  dB.  Once  again,  the  obtained  difference  in  thresh¬ 
old  is  much  less  than  one  would  expect  if  the  channels  could  be  combined 
in  an  optimum  statistical  manner. 

The  preceding  analysis  is  premised  on  the  amplitude  of  the  ripple,  or 
some  similar  quantity,  being  the  important  stimulus  feature.  It  is  easy 
to  believe,  a  priori,  that  it  is  not  the  signal  amplitude,  but  rather  the 
difference  in  amplitude  between  successive  components  that  is  really 
important  in  detecting  the  spectral  change.  This  idea  suggests  that  it  is 
the  'step'  created  by  the  elevation  of  a  single  component  that  gives  it 
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some  advantage  in  detectability  that  is  not  enjoyed  by  the  smoother  sinu¬ 
soidal  ripples.  But  this  line  of  argument  is  mistaken.  First,  as  ve  ob¬ 
served  above,  the  difference  between  successive  components  in  the  sinu¬ 
soidal  ripple  changed  by  nearly  an  order  of  magnitude  as  we  increased  the 
ripple  frequency.  The  ragged,  hlgh-ripple-f requency  spectra  are  no  easier 
to  detect  than  the  single  cycle  of  sinusoidal  variation.  Also,  as  Green 
and  Kidd  found,  Che  total  number  of  steps  in  the  spectra  plays  essentially 
no  role  in  determining  the  detectability  of  a  spectral  change.  A  single 
step,  either  upward  or  downward,  was  not  significantly  different  in  de¬ 
tectability  from  a  spectrum  that  stepped  up  and  down  on  successive  compo¬ 
nents  -  a  20  step  spectrum.  We  simply  do  not  understand  enough  about  the 
process  of  detecting  a  spectral  change  to  account  for  these  discrepancies. 

LOW  FREQUENCY  RIPPLE  AND  MORE  DENSE  SPECTRAL  PATTERNS 

Lastly,  we  report  on  an  experiment  in  which  we  varied  the  number  of 
components  used  to  define  the  spectra.  A  low-frequency  ripple  was  used, 
two  cycles  of  variation  over  the  range  from  200  to  5000  Hz,  either  sine  or 
cosine.  The  independent  variable  of  the  experiment  was  the  number  of  com¬ 
ponents  used  to  define  this  low-frequency  ripple.  The  components  were 
always  of  equal  amplitude  for  the  standard  spectrum.  For  the  sine  rippled 
spectra,  the  amplitude  of  the  components  is  given  by  the  Eq.  1.  For  any 
value  of  the  parameter,  M,  the  spectra  were  constructed  so  that  the  ratio 
of  frequencies  between  successive  components  of  the  pattern  was  a  con¬ 
stant.  The  specific  value  of  this  ratio  can  be  determined  from  the  for¬ 
mula;  the  ratio,  R,  is  equal  to  ten  raised  to  the  power  1 . 3979/ (M-l) . 

Thus,  for  M*8l  components,  the  ratio  is  1.041.  This  means  that  the  near¬ 
est  components  at  1000  Hz  are  1041  and  961  Hz.  For  M«3,  the  three  compo¬ 
nents  of  the  spectra  are  200,  1000,  and  5000  Hz.  For  the  three-component 
pattern,  the  ripple  was  simply  an  elevation  in  the  1000  Hz  component. 

Figure  4  shows  the  data  as  a  function  of  M,  the  number  of  components 
in  the  spectra.  As  can  be  seen,  the  threshold  for  the  pattern  is  elevated 


NUMBER  OF  COMPONENTS 

Fig.  4.  Threshold  for  a  two  cycle  ripple  as  a  function  of  the  number 
of  components  in  the  ripple. 


if  there  are  fewer  than  about  11  components  in  the  spectra.  As  the  number 
of  components  Increases,  the  thresh >ld  decreases  and  becomes  nearly  inde¬ 
pendent  of  the  exact  number  used.  This  result  can  be  summarized  by  saying 
that  the  spectral  profile  becomes  better  defined  as  the  numbei  of  compo¬ 
nents  in  the  spectrum  increases.  This  result  is  consistent  with  some  pre¬ 
vious  studies  where  we  increased  the  number  of  components  in  the  spectral 
pattern  (Green,  Kidd,  and  Picardi,  1983,  and  Green,  Mason,  and  Kidd, 

1984).  In  the  previous  studies,  however,  the  signal  was  an  increment  in  a 


single  component  of  the  spectrum.  Once  the  density  of  components  exceeds 
a  certain  value,  the  additional  components  cause  masking  at  the  signal 
frequency  as  Green  and  Mason  (1985)  have  shown.  In  the  present  case,  the 
signal  is  imposed  on  all  the  components  of  the  pattern  and  no  masking  can 
occur.  In  this  case,  the  threshold  remains  constant  and  independent  of 
the  number  of  components,  once  some  optimal  frequency  spacing  is  exceeded. 


SUMMARY 

The  thresholds  for  changes  in  a  spectral  pattern  were  measured  for 
several  patterns  of  change.  The  frequency  of  sinusoidal  variation  in  the 
spectral  pattern  has  cittle,  if  any,  effect  on  the  detectability  of  such 
spectral  change,  at  lease  over  the  frequency  range  studied.  The"-®  is  no 
evidence  of  any  sort  of  literal  inhibition  network.  The  density  of  compo¬ 
nents  used  to  define  the  sinusoidal  pattern  plays  no  role,  at  least  for  a 
very  low  frequency  of  variation.  A  major  puzzle  of  the  experimental  find¬ 
ings  was  the  relatively  small  difference  in  threshold  for  the  detection  of 
a  sinusoidal  change  in  the  spectrum  and  the  detection  of  an  increase  in 
amplitude  of  a  single  component.  No  explanation  of  this  discrepancy  was 
completely  convincing. 
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DISCUSSION 


YOST 

When  the  sinusoidal  ripple  added  to  a  complex  sound  is  on  linear  frequency 
as  it  is  in  rippled  noise,  listeners  are  as  sensitive  to  detecting  the 
ripples  as  they  are  in  the  experiments  of  Green  when  the  ripple  is  on  log 
frequency  (Yost  and  Hill,  1978  and  Bilsen  and  Ritsma,  1970).  In  addition, 
results  with  rippled  noise  on  linear  frequency  indicate  best  sensitivity 
when  the  spacing  between  the  spectral  peaks  is  200  to  500  Hz.  Thus,  with 
linear  sinusoidal  ripple  there  is  evidence  for  a  resonance  that  may  be  the 
result  of  Tateral  inhibition. 
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CARLY0N 

You  rightly  conclude  from  your  rippled  spectrum  experiment  that  lateral 
inhibitory  networks  do  not  appear  to  be  operating  in  the  conditions  of  your 
experiment.  However,  I  would  like  to  point  out  that  a  lateral  inhibitory 
model  such  as  Shamma's  would  not  work  effectively  with  your  stimuli.  This 
is  because  your  components  are  not  harmonically  related  (and  therefore  have 
varying  phases),  whereas  the  speech-like  stimuli  on  which  Shamraa  bases  his 
model  are  harmonic  and  have  non-random  phase.  Therefore  we  can  only 
conclude  that  lateral  inhioitory  networks  were  not  influencing  your  results, 
and  not  that  such  networks  do  not  operate  in  other  circumstances,  such  as 
when  processing  speech. 
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INTRODUCE' ON 

There  is,  unfortunately,  a  wide  gulf  between  research  in 
psychoacoustics  and  research  on  speech  perception.  These  differences 
arise,  In  port,  because  of  the  different  objectives  of  the  Investigators. 
Understanding  how  the  auditory  system  functions  and  understanding 
the  speech  code  are  different  and  distinct  goals.  But  there  are  some 
areas  and  topics  where  one  might  expect  a  communality  of  interest. 
The  topics  of  auditory  perception  and  the  limits  of  certain  basic 
auditory  discrimination  processes  are  both  areas  that  should  enjoy 

mutual  Interest  and  concern.  But,  even  here,  wide  differences  are 
apparent  in  the  way  these  topics  are  approached  by  the  speech 

scientist  and  by  the  psychoacoustican.  These  differences  are  especially 

evident  in  the  choice  of  stimulus  materials.  The  psychoacoustic  stimuli 
are  simple;  the  speech  stimuli  are  complex.  The  just-noticeable- 

difference  in  the  frequency  or  the  amplitude  of  an  isolated  pure  tone 
appears  to  have  little  to  do  with  how  we  recognize  differences 

between  vowels  or  broadband  consonants. 

The  simplicity  of  psychoacoustic  stimuli  Is  understandable,  given 
the  considerable  emphasis  placed  by  that  discipline  on  the  control  of 
stimulus  Intensity.  Psychoacoustic  stimuli  are  presented  at  specific 
sound  pressure  levels,  and  considerable  time  and  effort  are  devoted  to 
ensuring  that  these  levels  fall  within  some  small  tolerance.  A  typical 
limit  is  some  fraction  of  a  decibel,  since  the  Weber  fraction  for 
Intensity  of  a  single  sinusoidal  stimulus  is  about  1  dB,  The  absolute 
sound  level  of  speech,  on  the  other  hand,  Is  seldom  a  variable  of  much 
concern.  Obviously,  the  sound  must  be  intense  enough  to  ensure  that 
the  listener  can  hear  the  utterance.  But  that  condition  can  be  met 

over  a  large  intensity  range,  and  10  or  20  dB  differences  between 
presentation  levels  may  well  be  regarded  as  secondary.  The  reason  for 
such  broad  limits  is  simple  to  explain:  the  speech  code  Involves  a 
change  in  spectral  composition  over  time  and  seldom  depends  on  an 
absolute  Intensity  level.  Relative  Intensity  levels  at  different  regions 
of  the  spectrum,  the  definition  of  peaks  and  valleys  in  the  spectrum, 
and  the  frequency  region  where  the  energy  Is  present  are  thought  to 
be  the  most  Important  aspects  of  the  speech  code.  Indeed,  intensity 
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level  per  se  Is  generally  not  part  of  the  speech  code;  rather,  it  is 

used  to  accent  or  embellish  the  utterance. 

The  preceding  observations  provide  sufficient  background  for 
why  we  find  it  interesting  to  study  the  ability  of  the  human  observer 

to  discriminate  changes  in  the  shape  of  the  spectrum  of  a  complex 
auditory  stimulus.  Such  studies,  we  hope,  will  provide  us  with  basic 
information  as  to  how  the  auditory  sense  operates  and  will  begin  to 
contribute  to  ou^  understanding  of  speech  perception,  which,  after  all, 
is  the  primary  function  of  the  auditory  process.  In  order  to  understand 
our  research  on  the  discrimination  of  changes  in  spectral  shape  and,  in 

particular,  how  it  differs  from  the  previous  studies  of  intensity 

discrimination,  we  must  first  consider  in  some  detail  the  intensity 
discrimination  task. 


FIGURE  1,  Pure  intensity  discrimination  In  which  the  observer 
discriminates  a  standard  stimulus  { p(t))  from  a  scaled  version  [ a  1  * p(t)l . 
in  the  frequency  domain  (right  side  of  the  figure),  the  effect  of 
scaling  is  simply  to  displace  the  spectrum  along  the  ordinate.  The 
temporal  waveforms  (left  side)  are  identical  except  for  the  scaling 
factor. 


Let  us  first  consider  the  simplest  Intensity  discrimination  task, 
what  we  might  call  "pure"  intensity  discrimination.  The  two  sounds 

used  In  the  discrimination  task  are  either  one  pressure  wave,  p(t),  or 
a  scaled  version  of  that  same  wave,  a*p(t),  where  the  constant,  a,  is 
nr  unity.  In  the  frequency  domain,  the  two  spectra  are  simply 

dt.piaced  from  one  another  along  the  ordinate,  assuming  we  have 

plotted  the  spectra  on  a  logarithmic  Intensity  scale,  such  as  decibels. 
The  discrimination  problem  Is  to  select  between  the  two  spectra.  Pure 
intensity  discrimination,  such  as  that  illustrated  in  Figure  I,  may  be 
contrasted  with  a  different  task,  that  of  discriminating  a  change  in 
spectral  shape,  what  we  call  "profile  analysis".  The  stimuli  to  be 
discriminated  In  this  task  are  illustrated  in  Figure  2.  The  two  pressure 
waves,  pj(t)  and  P2U),  may  be  completely  unrelated.  Since  the 
waveshapes  are  different,  the  spectra  of  the  two  sounds  will  also  be 
different,  as  Illustrated  In  the  right  hand  portion  of  Figure  2. 
Although  the  shapes  of  the  spectra  differ,  the  listener  might  use 
differences  In  Intensity  at  a  particular  frequency  region  in  order  to 

achieve  the  discrimination  between  the  two  stimuli.  Unless  some 
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special  precautions  are  taken,  there  is  nothing  to  prevent  the  listener 
from  discriminating  a  change  In  spectral  shape  on  the  basis  of  some 
difference  in  Intensity  at  some  particular  frequency  region.  Thus,  the 
experimenter  could  not,  In  general,  guarantee  that  the  observer's 
performance  in  discriminating  a  change  in  spectral  shape  Is  in  any  way 
different  from  discriminating  a  change  in  intensity. 


Spectral  Shape  Discrimination  1 

In  tim« 

In  f r«qu«ncy 

p,  (t) 

-Arv 

v+rmjm 

Pi  ft) 

-Arv 

1 

Pl*M 

FIGURE  2,  Spectral  shape  discrimination.  The  stimuli  to  be 
discriminated  [ p  1  ( t), p2( t) )  may  be  completely  unrelated.  Thus,  In  the 
frequency  domain,  their  spectral  shapes  differ.  The  temporal  waveforms 
also  differ. 


The  special  experimental 
manipulation  that  ensures  that 
shape,  not  absolute  intensity,  Is 
the  critical  cue  in  the  case  of 
spectral  shape  discrimination  is 
Illustrated  in  Figure  3.  It  Is 
randomizing  the  overall  Intensity 
level.  On  each  and  every 
presentation  of  the  stimulus,  the 
level  at  which  they  are 
presented  is  chosen  at  random. 
Thus,  the  scale  constants,  aj 
and  82i  are  random  variables  8$ 
the  figure  indicates.  If  the 
range  of  these  random  variables 
is  sufficiently  large,  the  stimuli 
heard  In  the  discrimination  task 
will  clearly  differ  In  Intensity, 
and  the  observer  will  be  forced 
to  compare  some  other  aspect  of 
the  stimuli  to  distinguish 
bfc*ween  them.  In  our  case,  that 
difference  Is  the  shape  of  the 
auditory  spectra.  The  minimal 
comparison  that  must  be  mad? 
to  achieve  such  discrimination  is 
that  the  listener  measures  the  sound  levels  on  two  or  more  different 
parts  of  the  spectra  and  simultaneously  compares  them.  The  absolute 


FIGURE  3,  Stimuli  to  be 
discriminated  are  scaled  by 
random  variables  (al,  a2)  on 
each  and  every  presentation  to 
ensure  that  discrimination  is 
based  on  spectroi  shape  rather 
than  intensity. 
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sound  level  of  these  measurements  is  lurgely  irrelevant,  because  the 
stimuli  change  in  absolute  level  on  each  and  every  presentation. 

The  differences  in  the  structure  of  these  two  discrimination 
tasks  force  the  observer  to  use  somewhat  different  discrimination 
processes.  In  pure-intenslty  discrimination,  the  listener  must  construct 
some  estimate  of  absolute  intensity  level  and  either  compare  two  such 
estimates  made  at  different  times  or  compare  a  single  estimate  with 

some  long  term  standard.  In  spectral  shape  discrimination,  a 
simultaneous  comparison  of  two  or  more  spectral  regions  must  be 
made,  and  from  this  comparison  an  estimate  of  relative  level  on  any 
single  presentation  is  largely  Irrelevant,  because  it  is  confounded  by 
the  randomization  of  overall  level. 

What  we  would  like  to  do  in  this  pape<-  is  review  some  of  our 

research  on  this  topic  and  especially  emphasize  what  we  have  learned 

about  how  such  spectral  comparisons  operate.  As  psy  choacousticans, 
our  primary  interest  is  on  how  the  auditory  sense  works,  but  we  feel 
these  experiments  may  provide  some  insight  about  how  complex 
spectra!  discriminations  are  made  in  speech  waveforms. 

Procedure  and  Stimulus  Conditions 

Before  proceeding  with  a  description  of  the  individual 

experiments,  let  us  outline  something  about  the  procedure  and  stimulus 

conditions  used  in  the  research  and  why  these  experimental  conditions 
were  chosen.  For  almost  al!  o.f  the  studies,  we  use  a  multitonal 
complex.  The  stimuli  generally  cover  the  speech  range,  from  200  to 

5000  Hz.  The  frequencies  of  the  Individual  components  are  not, 

however,  harmonic,  as  they  are  in  speech.  The  tones  are  chosen  so 
that  successive  components  are  equally  spaced  on  a  logarithmic 
frequency  axis.  Thus,  the  frequency  ratio  of  successive  components  is 
a  constant.  The  reason  for  choosing  logarithmic  spacing  is  as  follows. 

We  know  the  cochlea  achieves  a  rough  Fourier  analysis  of  the  stimulus 
in  which  different  places  along  the  basilar  membrane  are  maximally 
sensitive  to  different  frequencies.  Roughly,  this  linear  array  is 

arranged  so  that  equal  spatial  extent  is  coded  as  equal  differences  in 

logarithmic  frequency.  Our  tones,  therefore,  provide  a  uniform  stimulus 
over  the  linear  receptor  surface  of  the  cochlea. 

A  typical  discrimination  task  involves  two  stimuli,  "standard" 
complex  and  some  alteration  of  the  standard  complex  which  we  achieve 
by  adding  a  "signal"  to  the  standard.  The  signal  Itself  consists  of  the 
in-phase  addition  of  energy  at  one  or  more  components  to  the 
corresponding  component  or  components  In  the  standard  complex.  We 
use  equal-amplitude  tones  for  our  standard  because  the  observers  learn 

this  standard  easily.  Thus,  little  training  Is  needed  In  order  to  study 
various  alterations  from  this  standard.  We  use  a  two-alternative 
forced-choice  procedure,  and  adaptively  change  the  level  of  the  signal 
to  estimate  the  level  which  would  yield  70.7%  correct.  Overall  intensity 
Is  typically  chosen  at  random  over  a  40-dB  range  in  I  dB  steps.  The 

median  level  Is  usually  about  50  dB  SPL  per  component. 

In  the  studies  reported  here,  the  dependent  variable  Is  the 
level  of  the  signal  (the  size  of  the  Increment)  re  the  level  of  the 
corresponding  component  or  components  In  the  background.  For 
example,  if  the  level  of  the  signal  Is  equal  to  the  level  of  the 
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corresponding  component(s)  of  the  standard,  then  we  say  the  signal- 
to-standard  ratio  Is  0  dB.  In  that  case,  the  component  to  which  the 
signal  is  added  would  be  increased  In  level  by  6  dB.  In  many  studies, 
the  signal  is  simply  an  increase  in  the  intensity  of  a  single  component! 
But  other  changes  have  been  studied  as  well,  such  as  a  variation  in 
the  amplitudes  of  all  components  of  the  standard.  In  the  following,  we 
recount  some  of  the  things  we  have  learned  about  the  perception  of  a 
change  in  the  shape  of  such  a  complex  auditory  spectrum. 

Effects  of  phase 

In  most  of  the  experiments  concerning  profile  analysis,  the 
phase  of  each  component  of  the  multitonal  complex  has  been  chosen  at 
random  and  the  same  waveform  (except  for  random  variation  of  level) 
is  presented  during  each  "non-signal"  interval.  Therefore,  the 
possibility  exists  that  observers  may  recognize  some  aspect  or  aspects 
of  the  temporal  waveform.  If  this  were  true,  then  discrimination  could 
be  based  on  some  alteration  of  the  temporal  waveform  during  the 
"signal"  interval  rather  than  by  a  change  in  the  spectral  shape  of  the 
stimulus  per  se. 

Green  and  Mason  (1985)  investigated  this  possibility  directly. 
Multicomponent  ‘complexes  were  generated  which  consisted  of  5,  11,  21, 
or  43  components  spaced  logarithmically.  In  all  cases,  the  frequency  of 
the  lowest  component  was  200  Hz,  the  highest  was  5  kHz.  The  overall 
level  of  the  complex  was  varied  randomly  over  a  40-dB  range  across 
presentations  with  a  median  level  of  45  dB  SPL,  The  signal  consisted 
of  an  increment  to  the  1-kHz,  central  component. 

In  what  Green  and  Mason  termed  the  "fixed-phase"  condition, 
for  each  number  of  components  -  (5,  II,  21,  and  43),  four  different 
standard  waveforms  were  generated  by  randomly  selecting  the  phases 
of  each  component  of  the  complex.  Each  of  these  standards  was  fixed 
for  a  block  of  trials  and  signal  thresholds  were  obtained  for  each  of 
the  different  randomizations.  Note  that  for  these  fixed-phase 
conditions,  the  same  waveform,  except  for  changes  In  overall  level, 
occurred  during  each  non-signal  interval. 

In  what  Green  and  Mason  called  the  "rendom-phase"  conditions, 
for  each  value  of  the  number  of  components  (5,  M,  21,  and  43)  88 
different  standard  waveforms  were  generated  by  randomly  selecting  the 
phase  of  each  component  of  the  complex.  On  each  presentation  of 
every  trial,  pairs  of  these  88  waveforms  were  selected  at  random  (with 
replacement).  Thus,  the  temporal  waveforms  generally  differed  on  each 
presentation.  The  amplitude  or  power  spectra  of  the  stimuli  were, 
however,  identical. 

The  results  are  presented  in  Figure  4.  For  each  value  of 

component  number,  the  open  circles  represent  the  thresholds  obtained 
for  each  of  the  four  randomizations  in  the  fixed-phase  condition.  The 
solid  triangles  represent  the  data  obtained  In  the  random-phase 
conditions.  The  results  Indicate  that  changing  the  phase  of  the 

individual  components  and  thus  the  characteristics  of  the  temporal 
waveform  has  little,  if  any,  effect  on  discrimination.  This  is  true 
whether  the  same  phase  Is  used  for  a  block  of  trials  or  If  the 

waveform  is  chosen  at  random  on  each  and  every  presentation.  These 
data  are  consistent  with  those  obtained  by  Green,  Mason  and  Kidd 
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(1984)  who  had  generated  their  waveforms  using  a  procedure  similar  to 
the  fixed-phase  condition,  The  form  of  the  function  relating  threshold 
to  the  number  of  components  which  compose  the  multicomponent 
background  will  be  discussed  in  detail  In  a  subsequent  section. 

The  Inability  of  changes  in  fhe  phase  of  the  individual 
components,  and  thus  changes  in  the  characteristics  of  the  temporal 
waveform,  to  affect  discrimination  supports  the  view  that,  In  these 
tasks,  observers  are,  Indeed,  basing  their  judgements  on  changes  In  the 
power  spectra  of  the  stimuli. 


FIGURE  4,  Signal  threshold  (d B)  as  a  function  of  the  frequency  of  the 
number  of  components  in  the  complex.  Open  circles  represent  the  data 
obtained  for  each  of  the  four  phase-randomizations  when  the  phase  of 
each  component  was  fixed  throughout  a  block  of  trials  ("fixed-phase" 
condition).  Filled  triangles  represent  the  data  from  the  "random-phase" 
condition  In  which  the  phases  of  the  components  were  chosen  at 
random  on  each  presentation. 


Frequency  Effects 

So  far  we  have  demonstrated  that’  the  detection  of  changes  In 
the  shape  of  a  complex  auditory  spectrum  Is  based  on  changes  in  the 
power  spectrum  of  the  stimulus;  the  phase  relation  among  the 
components  Is  unimportant.  The  next  question  we  consider  is  whether 
the  ability  to  detect  a  change  In  the  power  spectrum  Is  greatly 
Influenced  by  the  frequency  region  where  the  change  occurs.  Consider 
our  complex  standard  composed  of  a  number  of  sinusoidal  components. 
Suppose  we  alter  that  standard  spectrum  by  Increasing  the  intensity  of 
a  single  sinusoid.  A  natural  question  is- -does  the  frequency  locus  of 
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the  change  greatly  affect  the  ability  to  detect  the  change?  The  answer 
to  this  question  settles  an  important  practical  Issue--  to  what  degree 
are  different  frequency  regions  homogeneous?  In  speech,  at  least  for 
vowels,  the  significant  spectral  changes  typically  occur  within  the 
range  of  500  to  2000  Hz.  As  far  as  we  are  aware,  there  is  no  claim 

that  small  alterations  of  the  spectrum  are  better  detected  at  one 

frequency  region  rather  than  some  other.  Thus,  we  would  be  surprised 
to-  find  that  the  ear's  ability  to  detect  a  small  change  In  the  spectrum 
differs  greatly  as  a  function  of  frequency. 

This  question  Is  also  of  basic  Interest  In  psychoacoustics, 

because  it  bares  on  the  question  of  intensity  coding  and  whether  or 

not  temporal  factors,  such  as  the  synchrony  of  discharge  patterns,  are 
utilized  as  part  of  the  intensity  code.  Sachs  and  Young  (1979)  and 

Young  and  Sachs  (1979)  have  demonstrated  that  'neural  spectograms1 

based  on  neural  synchrony  measures  preserve  the  shape  of  speech 

spectra  better  than  those  based  on  firing  rate  codes.  We  were, 
therefore,  particularly  Interested  In  how  well  the  observers  could 

detect  a  change  in  spectral  shape  at  higher  frequencies.  At  the  highest 
frequencies,  above  2000  Hz,  neural  synchrony  deteriorates  and,  if  that 

code  were  used  to  signal  changes  in  spectral  shape,  then  the  ability  to 
detect  such  alterations  in  the  acoustic  spectrum  should  also 
deteriorate.  Certainly,  differences  among  vowels  are  not  signaled  by 
changes  in  the  location  of  higher  frequency  formats.  But,  In  the  case 

of  speech,  this  frequency  limitation  may  be  the  result  of  the 

production  system,  that  is,  the  coding  system,  not  the  decoding 

system. 


LU 
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FIGURE  5,  Signal  threshold  (d B)  as  a  function  of  the  frequency  of  the 
signal.  A  twenty -one-component  complex  was  used  as  the  standard.  The 
frequency  of  the  lowest  component  was  200  Hz;  the  frequency  of  the 
highest  component  was  5000  Hz.  The  signal,  whose  frequency  Is 
indicated  on  the  abscissa,  was  added  In-phase  to  the  corresponding 
component  In  the  complex. 
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In  a  previous  study,  Green  and  Mason  (1985),  we  had  measured 

how  the  locus  of  the  frequency  changes  affects  the  ability  to  detect  a 

change  in  complex  spectra.  Tnese  results  suggested  that  the  mid¬ 
frequency  region,  500  to  2000  Hz,  was  the  best,  but  variability  among 
the  different  observers  was  sizable.  Also,  those  data  were  taken  after 
a  previous  experiment  in  which  the  signals  were  in  the  middle  of  that 
range.  Although  extensive  training  was  given  in  the  later  experiment 
to  all  the  different  frequencies  tested,  It  Is  conceivable  that  some  of 
the  data  were  influenced  by  the  preceding  experiment.  In  any  case, 
the  recent  move  of  our  laboratory  provided  an  opportunity  to  recruit  a 
new  set  of  listeners  that  were  truly  naive  with  respect  to  the 
parameter  of  interest. 

The  results  of  our  most  extensive  experiment  (Green,  Onsan, 
and  Forrest,  1987)  on  this  issue  are  shown  in  Figure  5.  The  standard 
spectrum  was  a  complex  of  21  components,  all  equal  in  amplitude  and 
equally  spaced  in  logarithmic  frequency.  The  overall  level  of  the 
standard  was  varied  over  a  20  dB  range  with  the  median  value  of  60 
dB.  The  signal,  <whose  frequency  is  plotted  along  the  abscissa  of  the 
figure,  was  an  increment  in  the  intensity  of  a  single  component.  The 
ordinate,  like  that  of  Figure  4,  Is  again  the  signal  level  re  the  level 

of  the  component  to  which,  it  was  added.  The  results  show  that  best 
detection  occurs  in  a  frequency  range  of  300  to  3000  Hz,  with  only  a 

imild  deterioration  occurring  at  the  higher  and  lower  frequencies.  These 

results  give  little  support  to  the  idea  that  neural  synchrony  is  used  to 
estimate  intensity  level,  because,  were  such  the  case,  there  should  be 
I  a  more  marked  deterioration  in  the  ability  to  hear  a  change  in  the 

|  spectrum  as  a  function  of  frequency'. 


PROFILE-ANALYSIS  AND  THE  CRITICAL  BAND 

The  evidence  presented  thus  far  suggests  the  detection  of  a 
change  in  spectra!  shape,  or  profile-analysis,  is  a  "global"  process 

relying  on  simultaneous  comparisons  In  two  or  more  regions  of  the 
spectrum.  An  issue  of  central  concern  is  the  width  of  the  spectrum 

over  which  these  comparisons  can  be  made.  If  one  were  to  invoke 
classical  "critical-band"  notions,  which  pervade  much  of  psychoacoustic 
research,  it  would  be  expected  that  only  frequencies  close  to  the 
frequency  of  the  signal  could  be  used  in  detecting  an  increment. 

Green,  Muson,  and  Kidd  (1984)  obtained  data  which  address  this 

Issue.  In  their  experiment,  the  signal  consisted  of  an  Increment  to  the 

1-kHz,  central  component  of  a  multltonal  complex.  The  multitonal 
complexes  consisted  of  equal -amplitude,  logarithmically-spaced 
components.  In  the  first  condition,  what  we  will  refer  to  as  the 
"range"  condition,  the  standard  consisted  of  a  three-component 
complex.  The  parameter  was  the  range  of  frequencies  spanned  by  the 
standard,  that  is,  the  separation  In  frequency  between  the  two 
components  which  flanked  the  central,  1-kHz  component. 

In  the  second  condition,  what  we  will  call  the  "range/number" 
condition,  the  number  of  components  us  well  os  the  range  was  varied. 
Additional  flanking  components  were  udded  to  the  complex  resulting  in 
multltonal  complexes  of  3,  5,  7,  9,  and  II  components.  These  additional 
components  increased  the  range  of  frequencies  covered  by  the 
standard. 
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FIGURE  6,  Signal  threshold  (d B )  as  a  function  of  the  logarithm  of  the 
ratio  of  frequencies  spanned  by  the  complex.  Open  circles  represent 
the  data  obtained  from  the  "range"  condition,  in  which  each  complex 
comprised  three  components.  The  sjgnal  was  always  added  to  the 
central  component  of  the  complex,  a  1000  Hz'  component.  The  numbers 
at  the  top  of  the  graph  give  the  frequency  of  the  other  two 
components  of  the  complex.  Solid  squares  represent  data  obtained  from 
the  "range/number"  condition  In  which  the  number  of  components  in 
the  complex  and  the  range  was  varied.  Again,  the  signal  Is  an 
increment  in  the  central  component.  From  the  left-most  portion  of  the 
graph,  the  squares  represent  complexes  comprising  3,  5,  7,  9,  and  11 
components  respectively. 


The  results  of  these  two  conditions  are  presented  in  Figure  6. 
The  abscissa  Is  the  logarithm  of  the  ratio  of  the  highest  to  the  lowest 
component  In  each  complex.  The  data  obtained  in  the  range  condition, 
with  the  three-component  complexes,  are  plotted  as  open  circles.  The 

solid  squares  represent  the  data  obtained  In  the  range/number 

condition  when  the  range  of  frequencies  spanned  by  the  complex  and 

the  number  of  components  covaried.  Each  point  is  the  mean  of  six 
estimates  of  threshold  obtained  from  the  three  subjects  who 
participated.  The  error  bars  represent  the  mean  of  the  standard  error 
computed  for  each  observer. 

Focusing  on  the  data  obtained  in  the  range/number  condition 

(solid  squares),  It  Is  clear  that  as  the  number  of  components  is 

Increased,  performance  Improves  by  10  dB  or  more.  Although  only  a 
small  improvement  is  realized  when  the  number  of  components 

increases  beyond  seven,  the  data  obtained  with  seven  components 

Indicate  that  tones  almost  1.5  octaves  away  from  the  central,  1-kHz 
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component  (2626  Hz  and  380  Hz)  have  a  dramatic  effect  on 

performance.  This  result  is  in  conflict  with  "critical -band"  notions 
which  would  predict  that  energy  at  frequencies  remote  from  rhe  signal 
would  have  little  effect  on  its  detection. 

The  data  obtained  with  the  three-component  complexes  (open 
circles)  also  indicate  that  increasing  only  the  range  of  the  complex 
improves  performance  but  not  to  the  extent  found  when  the  number  of 
components  is  also  increased. 

In  short,  for  a  given  frequency  range,  performance  is  improved 
when  the  number  of  components  which  compose  the  profile,  that  is,  Its 
density  is  Increased.  This  result  was  also  obtained  by  Green,  Kidd,  and 
Picardi  (1983).  Their  data  showed,  in  addition,  that  if  the  density  of 

components  in  the  complex  is  great  enough,  then  several  components 

fall  very  close  to  the  frequency  of  the  signal  and  detection 

performance  will  deteriorate.  Such  an  outcome  Is  explained  by  simple 
masking  and  Its  existence  supports  the  critical  band  concept.  In  such  a 
case,  the  additional  components  fall  within  a  critical-band  surrounding 
the  frequency  of  *the  signal  component,  and  thus  an  Increment  to  the 
signal  component  produces  a  relatively  smaller  increase  in  power  in  Its 
region  of  the  spectrum. 

In  summary,  the  conflict  with  classical  "critical-band"  concepts 
arises  because  energy  at  frequencies  remote  from  that  of  the  signal 

influences  performance  in  these  tasks.  The  data  confirm  the  notion 

that  profile  analysis  Is  a  global  process  which  relies  upon  the 

integration  of  Information  across  many  critical -bands. 

Profile  Analysis  versus  Simple  Intensity  Discrimination 

In  the  concluding  section  of  this  paper,  we  compare  the  acuity 
of  discriminating  a  change  In  the  shape  of  a  complex  spectrum  to  the 

acuity  of  detecting  a  change  In  absolute  Intensity  level.  As  reviewed  in 
the  first  section  of  this  paper,  one  may  distinguish  two  separate 

processes  for  comparing  Intensity  in  a  complex  spectrum.  The  first  we 
called  pure-intensity  discrimination;  this  process  detects  a  change  in 
absolute  intensity  level.  The  acuity  of  this  process  can  be  measured  in 
tasks  where  the  spectrum  of  the  signal  does  not  change  its  shape,  but 
is  simply  altered  in  level.  We  have  contrasted  this  process  with  the 
detection  of  a  change  in  the  shape  of  the  complex  auditory  spectrum, 
what  we  have  called  profile  analysis.  In  detecting  a  change  In  spectral 
shape,  the  process  must  be  one  of  slmul  taneous  comparisons  of 
Intensity  levels  at  different  regions  of  the  spectrum,  because  random 

variation  in  the  overall  level  of  the  spectrum  on  successive 

presentations  renders  the  use  of  absolute  level  on  any  presentation  an 
ineffective  strategy.  For  a  fixed  change  in  Intensity,  can  one  hear  that 
change  best  In  a  pure-intensity  discrimination  task  or  In  a  profile 
task?  A  clear  answer  to  this  question  is  of  some  practical  Importance, 
because  for  many  naturally-occurring  stimuli  such  as  complex  speech 

spectra,  both  processes  are  potentially  available.  Presumably,  the 
observer  uses  either  a  combination  of  the  two  systems  or  the  more 

sensitive  system  alone.  To  predict  performance  In  a  variety  of  realistic 
situations,  one  would  have  to  know  the  relative  sensitivity  of  the  two 

systems. 

Comparison  of  detection  performance  In  the  two  situations  Is, 
however,  complicated  by  the  Issue  of  prior  training  and  experience. 

The  situation  is  not  unlike  that  of  testing  the  ability  of  observers  to 
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hear  some  phonemic  distinction  In  a  particular  language.  If  one  uses  a 
group  of  subjects  whose  natural  language  uses  this  distinction,  then 
one  may  expect  finer  discrimination  capacity  from  that  group  than 
from  another  group  of  subjects  whose  native  language  does  not  use 
this  distinction.  Similarly,  we  have  observed  that  listeners  with  a  long 

history  of  training  in  pure  intensity-discrimination  experiments  often 
do  poorly  when  first  confronted  by  a  task  involving  the  detection  of  a 
change  in  spectral  shape.  It  is  also  true  that  observers  who  are  well- 
practiced  in  detecting  changes  In  spectral  shape  often  find  detection 
of  simple  intensity  changes  initially  difficult.  Recently,  a  well-trained 
profile  observer  complained,  when  asked  to  discriminate  a  change  in 
the  intensity  of  a  single  sinusoid,  that  the  only  thing  hi  could  listen 

for  was  a  change  in  loudness! 

A  second  factor  that  makes  the  comparison  of  the  detection 
performance  In  the  two  tasks  difficult  is  that  there  is  more  range  in 
the  ability  of  different  people  to  hear  simple  changes  in  intensity  level 
than  is  usually  admitted  in  the  literature.  The  impression  that  the 
Weber  fraction  is  nearly  constant  over  individuals  Is  created  largely  by 
the  use  of  a,  very  compressive  measure  of  the  Weber  fraction  in  dB 
1 1 0 log  (1+AI/U).  One  often  reads  that  the  Weber  fraction  is  about  1 

dB.  What  is  not  appreciated  is  that  a  change  from  0.5  to  1.5  dB 

corresponds  to  a  10  dB  change  on  the  scale  of  signal-to-background 
level  which  we  have  commonly  used  in  profile  experiments.  Individual 

differences  among  listeners  are  sizable.  Using  our  scale  of  signal-to- 
component  level,  then  we-  often  find  differences  of  10  dB  among 
Individuals  in  both  pure- Intensity  discrimination  tasks  as  well  as 
profile  tasks. 

A  final  complication  Is  that  the  observers  we  use  in  most  of 
our  profile  tasks  are  not  a  random  selection  from  the  population; 
rather,  they  are  selected  on  the  basis  of  previous  listening 
performance.  Some  observers  find  it  extremely  difficult  to  hear  the 

change  In  shape  of  a  complex  spectrum.  While  they  improve  with 
practtce,  It  does  not  appear  likely  that  they  will  ever  be  useful 
participants  in  a  series  of  experiments  Involving  the  comparison  of 
thresholds  obtained  in  a  variety  of  different  experimental  conditions. 
Our  usual  procedure  is  to  train  and  test  subjects  over  a  period  of  one 
or  two  days  (two  hours  of  listening  per  day)  on  the  detection  of  an 
increment  in  a  1000  Hz  tone  in  an  II-  or  21 -component  complex.  For 

the  listener  to  continue  In  these  experiments,  we  require  that 
detection  performance  reach  the  -10  to  -20  dB  range  at  the  end  of 
two  or  three  days.  In  general,  we  believe  that  practically  all  subjects 
could  be  trained  to  reach  this  level  of  performance,  but  if  more  than 

three  days  are  required  we  feel  that  such  observers  would  require  an 

excessive  amount  of  training  throughout  the  various  conditions  of  the 
experiment. 

A  direct  comparison  of  the  relative  sensitivity  of  two  groups  of 
listeners  was  recently  made  by  Green  and  Mason  (1985).  They 

compared  two  groups  of  observers--five  experienced  In  profile 
listening,  five  who  were  not.  The  five  inexperienced  profile  listeners 
had  considerable  training  in  tasks  that  could  be  classified  as  pure- 
lntenslty  discrimination  tasks.  The  thresholds  for  the  ten  observers 

were  measured  in  two  detection  tasks,  a  pure  intensity-discrimination 

task  and  a  profile  task. 
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The  pure-intensity  discrimination  task  was  the  detection  of  a 
change  in  the  level  of  a  1000  Hz  sinusoid.  The  sinusoid  was  fixed  in 
level  at  40  dB  SPL.  The  profile  task  was  the  detection  of  the  change 
in  the  intensity  of  that  same  component,  but  the  1000  Hz  component 
was  surrounded  by  10  other,  equal-amplitude  components.  We  used  the 
familiar,  11 -component  complex  (200-5000  Hz).  To  make  the  tasks 
comparable,  the  level  of  all  the  components  was  also  fixed  on  each 
and  every  presentation,  at  40  dB  SPL.  The  ratio  of  the  frequencies 
between  successive  components  of  the  complex  was  approximately  1.38. 
Thus,  the  two  neighbours  to  the  1000  Hz  components  had  frequencies 
of  1379  and  724  Hz.  The  signal  duration  in  both  tasks  was  100  msec. 
The  thresholds  were  estimated  from  the  mean  of  6  runs  of  50  adaptive 
trials  (two  down-one  up).  Table  5-1  presents  the  thresholds  estimated 
in  the  two  tasks  for  the  ten  observers. 

Table  5-1 


Entry  is  the  relative  signal  threshold  In  dB 
(standard  error  of  estimate) 


Observers 


Single  Profile 

Sinusoid  .  - 


Diff  (SS-P) 


Profile 

Experienced 


1 

-10.5 

(1.4) 

-18.6 

(1.7) 

8.1. 

2 

-6.4 

(2.0) 

-13.6 

(0.6) 

7.2. 

3 

-12.0 

(0.8) 

-18.5 

(1.3) 

6.5 

4 

-11.2 

(1.3) 

-15.8 

(1.2) 

4.6 

5 

-18.0 

(1.5) 

-22.7 

(2.3) 

4.7 

mean 

-11.6 

(1.4) 

-17.8 

(1.4) 

6.2 

Profile 


Inexperienced 


6 

-20.0 

(1.6) 

-10.9  (2.2) 

-9.1 

7 

-13.2 

(2.0) 

-42.3  (1.6) 

-0.9 

8 

-19.7 

(1.0) 

-9.2  (1.3) 

-10.5 

9 

-14.0 

(1.0) 

-10. 0  (1.6) 

-4. 

10 

-17.4 

(0.8) 

-20.2  (1.4) 

+  2.8 

mean 

-16.9 

(1.6) 

-12.5  (1.6) 

-4.3 

As  can  be  seen  In  the  table,  there  is  almost  a  perfect 
Interaction  between  thresholds  In  the  two  tasks  and  previous  training. 

The  best  average  detection  performance  is  about  -17  dB  for  both 
groups,  but  it  occurs  for  different  conditions.  For  the  experienced 

profile  listeners,  It  occurs  In  the  profile  conditions.  For  the 
inexperienced  profile  listeners,  it  occurs  In  the  single  sinusoid 

condition.  The  average  difference  between  performance  on  the  favored 
and  unfavored  task  Is  also  very  similar  in  the  two  groups,  about  5  dB. 
The  pattern  of  interaction  between  past  listening  experience  and  the 
two  detection  tasks  Is  reflected  by  nearly  every  Individual  observer 

with  one  singular  exception  (Observer  10).  That  observer,  whose 
performance  level  is  good  on  both  tasks,  is  somewhat  better  on  the 
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profile  task,  despite  the  lack  of  previous  experience.  Note  the  range 
of  thresholds  obtained  for  either  group  within  each  task.  Such 
differences  among  individuals  are  typical. 

Presumably,  with  enough  training,  both  groups  would  improve 

on  the  unfamiliar  task,  but,  unfortunately,  we  have  no  firm  data  to 

support  that  conjecture.  Informally,  we  tried  to  improve  the 
performance  of  the  inexperienced  profile  listeners  in  the  profile  tasks, 
but  their  thresholds,  after  an  additional  2000  trials,  did  not  improve 
very  much.  We  are  still  uncertain  about  how  best  to  Interpret  this 

result.  The  interaction  present  in  the  data  reflects  either  a  difference 
in  training  or  real  individual  difference  among  observers.  It  may  be 

that  differences  In  past  experience  can  simply  not  be  overcome  by  a 

few  thousand  trials.  One  could  argue  that  it  Is  like  trying  to  hear  an 

acoustic  distinction  that  is  not  used  in  one's  native  language. 
Alternatively,  it  is  possible  that  there  are  simply  two  different  types 

of  observers.  One  type  good  at  discriminating  changes  in  absolute 
intensity,  another  good  at  discriminating  changes  at  spectral  shape. 

While  it  Is ‘  unlikely  that  a  random  sample  of  ten  individuals 

would  divide  so  perfectly,  one  cannot  claim  that  the  profile  group  was 
a  completely  random  sample.  As  described  previously,  some  preliminary 
testing  was  completed  before  selecting  this  group  and  such  tests  could 
indeed  have  biased  the  group  to  be  good  "profile1  listeners.  The 
difference  in  their  performance  in  the  two  tasks  is  reasonably  uniform 
over  all  the  observers  experienced  in  profile  listening.  The  group 
inexperienced  in  profile  listening  was  probably  a  more  random 
selection  from  the  general  population,  and  their  results  are  more 
mixed.  Observers  7  and  10  show  little  difference  in  their  performance 
in  the  two  tasks  (their  difference  scores  are  -0.9  and  +2.8  dB, 
respectively).  Whether  there  are  really  different  types  of  observers  or 
simply  differences  in  past  experience  remains  a  fascinating,  but 
unsettled,  question. 
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Introduction 

Nearly  ten  years  ago,  Viemeister  (1979)  proposed  a  model  to  explain  how 
human  observers  detect  amplitude  modulation  of  a  noise  signal.  In  this  paper, 
we  modify  that  model  slightly  and  extend  its  application  to  the  detection  of 
brief  temporal  gaps  in  noise.  Measuring  gap  detection  has  become  an 
increasingly  popular  way  of  assessing  temporal  properties  of  the  auditory 
system  (Penner,  1977;  Fitzgibbons,  1983;  Green,  1985).  Simulation  using  a 
modified  model  provides  excellent  prediction:  of  the  thresholds  obtained 
with  partially  filled  gaps  as  well  as  their  psychometric  functions.  Despite  this 
success,  the  computer  simulations  indicate  that  the  gap  threshold  is  not 
strongly  influenced  by  the  two  major  variables  of  the  model,  namely,  filter 
bandwidth  and  integration  time.  Thus,  the  applicability  of  this  model  to  the 
understanding  of  hearing  impairment  remains  unclear. 

The  paper  begins  with  a  brief  description  of  the  detection  tasks  and 
Viemeister’s  original  model.  Next,  modifications  of  the  original  model  are 
described  and  our  reasons  for  their  adoption,  are  explained.  The  applicability 
of  this  modified  model  to  partially  filled  noise  gaps  is  then  described.  Finally, 
we  explore  the  model's  predictions  about  how  gap  threshold  should  change  as 
a  function  of  the  two  major  parameters  of  the  model.  We  begin  with  a 
description  of  the  task. 

Detection  task 

All  the  detection  data  we  will  discuss  were  based  on  a  choice  between  two 
stimulus  alternatives.  One  stimulus  alternative  was  an  uninterrupted  or 
continuous  noise  which  we  refer  to  as  the  standard.  The  other  stimulus 
alternative  was  noise  which  was  interrupted  or  altered  in  amplitude  in  some 
fashion.  One  such  alteration  was  a  temporal  gap  in  the  noise  process, 
illustrated  at  the  top  left  (see  Fig.  1).  The  second  alteration  was  amplitude 
modulation  of  the  noise  waveform,  illustrated  in  the  bottom  left  of  Fig.  1. 
These  two  alterations  define  two  detection  tasks  called  gap  detection  and 
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Figure  1.  Input  and  output  waveforms  for  broadband  noises  with  a  gap  ( top)  or  ar 
sinusoidally-amplilude  modulated  noise  ( bottom).  lr 

ci 

modulation  detection.  Disregard  the  right  column  of  Fig.  1,  it  represents  the  sa 
output  of  a  model  that  we  will  discuss  shortly.  All  the  data  reported  in  this 
paper  were  obtained  from  one  of  these  two  detection  tasks.  The  following  is  a  M 
brief  summary  of  the  details  of  the  stimulus. 

Two-alternative  forced-choice  procedures  were  used  to  estimate  all  1) 

thresholds.  The  standard  was  either  continuously  present  or  was  presented  for  w 

500  ms  and  occurred  in  one  of  the  two  stimulus  intervals.  The  signal  was  al 

presented  in  the  other  interval  of  the  forced-choice  task.  A  two-down  one-up  a  : 

adaptive  procedure  was  used  to  estimate  threshold.  We  generally  report  the  bi 

mean  of  three  listeners'  thresholds. 

Broadband  noise  was  computer  generated  and  presented  over  12-bit  D  to  A  s( 
converters  at  a  rate  of  25,000  points  per  second,  and  lowpass  filtered  at 
10,000  Hz.  More  details  of  the  stimulus  procedure  can  be  found  in  Forrest  and  w 
Green  (1987).  m 


The  simulations  reported  in  this  paper  were  obtained  by  programming  the 


model  to  act  as  a  human  observer.  The  input  to  the  model  was  the  digital  th 

version  of  the  signals  heard  by  the  observers.  The  model  analyzed  two  sound  th 

buffers  corresponding  to  the  two  intervals  of  the  forced-choice  procedure  m 

(standard  and  signal)  and  made  a  choice  between  the  two.  The  signal  level  was  at 

adjusted  adaptively  to  estimate  a  threshold  for  the  model,  just  as  it  had  been  p< 

for  the  human  observers.  AH  computations  were  carried  out  on  a  micro¬ 
computer  (IBM  AT  or  equivalent).  The  human  observers  took  about  5  minutes  <; 
to  run  50  trials  and  to  obtain  a  threshold  estimate  with  about  10  to  15 
reversals.  The  model  took  about  3  to  10  times  longer.  w 

P< 


O' 
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Gap  detection  procedure 

The  two  stimulus  alternatives  of  the  gap  detection  task  were:  1)  the 
standard  waveform  or  2)  the  signal  waveform.  The  standard  waveform  was  a 
500- ms  burst  of  noise  of  constant  average  level.  The  signal  waveform  was  also 
a  500-ms  burst  of  noise,  except  each  sample  from  the  temporal  center  of  the 
noise  burst  was  scaled  by  an  amount  equal  to  (1-k).  The  duration  of  this 
attenuated  segment  was  called  the  noise  gap.  The  task  problem  was  to  dis¬ 
criminate  between  these  two  alternatives.  If  the  value  of  k  =  0.5,  then  the 
noise  was  reduced  in  level  by  6  dB  for  the  duration  of  the  gap.  If  k  =  1.0,  then 
the  noise  was  fully  cancelled  during  the  gap,  a  condition  typical  of  that  used 
in  most  studies  of  gap  detection. 

An  atypical  part  of  the  procedure  used  in  these  experiments  was  that  we 
randomized  the  level  of  each  sound  as  it  was  presented.  The  level  was  selected 
from  a  rectangular  distribution  with  a  range  of  10  dB.  The  median  level  of  the 
noise  was  about  65  dB  overall,  25  dB  spectrum  level.  We  randomized  the 
presentation  level  because  the  introduction  of  the  gap  reduces  the  total 
energy  in  the  noise  waveform  by  an  amount  that  depends  on  the  size  of  the  gap 
and  the  amount  of  the  attenuation.  Randomization  discourages  observers  from 
trying  to  use  overall  level  as  a  detection  cue,  and  makes  the  primary  detection 
cue  one  of  temporal  variation  of  noise  level  present  within  the  half-second 
sample. 

Modulation  detection  procedure 

The  two  stimulus  alternatives  of  the  modulation  detection  procedure  were: 
1)  the  standard  waveform  or  2)  a  signal  waveform.  The  standard  waveform 
was  a  continuous  noise  presented  throughout  the  50  trials  of  the  two- 
alternative  forced-choice  task.  The  signal  waveform,  500  ms  in  duration,  was 
a  set  of  noise  sample  multiplied  by  a  sinusoid.  Thus,  the  signal  waveform  may 
be  described  as 

s(t)  =  [1  +  m  cos(2*fmt)]n(t)  (1) 

where  n(t)  is  the  unmodulated  or  standard  noise  waveform,  fm  is  the  rate  of 
modulation  in  Hertz,  and  m  is  the  degree  of  modulation. 

A  somewhat  atypical  part  of  the  procedure  used  in  these  experiments  was 
that  the  signal  waveform  was  adjusted  in  power,  so  that  the  average  power  of 
the  signal  and  standard  waveforms  were  equated.  When  noise  is  amplitude 
modulated,  the  modulated  waveform  is  increased  in  average  power  by  an 
amount  that  depends  on  the  degree  of  modulation.  The  expected,  or  average, 
power  of  s(t),  <S>,  is  given  by 

<S>  =  ( I  +m,/2)-<N>  (2) 

where  <N>  is  the  expected  power  of  the  standard  noise.  Thus,  unless  m  =  0,  a 
potential  cue  for  detecting  the  presence  of  modulation  is  the  increase  in 
overall  power  caused  by  amplitude  modulating  the  noise.  This  potential 
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Figure  2.  Vicmeister’s  three  stage  model.  Samples  of  the  input  waveforms  ( Xj) 
pass  through  an  initial  bandpass  filter  with  bandwidth  (W ).  then  through  a 
half-wave  rectifier  and  a  simple  lowpass  filter.  A  decision  statistic  (bottom)  is 
computed  from  samples  of  the  output  waveform  (Yj). 

artifact  was  appreciated  by  Viemeister  (1979)  and  is  responsible  for  the 
asymptotic  value  of  the  threshold  for  high  modulation  frequencies,  where  m 
is  large  (see  Fig.  7  of  Viemeister,  1979).  In  all  our  experiments,  we  scaled  the 
signal  waveform,  so  that  the  expected  power  of  the  signal  was  exactly  <N>, 
independent  of  the  value  of  m. 

Viemeister’s  MTF  model 

The  three  stages  of  Viemeister’s  MTF  model  are  shown  in  Fig.  2.  The 
incoming  signal  is  first  bandpass  filtered,  with  a  filter  of  bandwidth,  W.  Next 
the  signal  is  half-wave  rectified.  Finally,  the  output  of  the  rectifier  is 
smoothed  with  a  simple  one-stage  (6  dB  per  octave)  lowpass  filter.  The 
output  of  the  model,  Yj,  provides  the  input  to  a  decision  stage  that  selects 
which  of  the  two  stimulus  alternatives  is  correct.  In  Viemeister’s  original 
work,  he  used  the  variance  of  the  Y  values.  Such  a  decision  statistic  will  be 
larger,  on  average,  when  the  noise  has  been  amplitude  modulated,  as  shown  in 
Fig.  1.  The  figure  shows  the  output  of  the  model  to  either  a  gap  (top)  or 
amplitude  modulated  (bottom)  input. 

Our  modification  of  the  original  model  consisted  of  changing  the  decision 
statistic.  Instead  of  using  the  variance  of  the  output  number,  Yj,  we  used  the 
ratio,  R,  of  the  maximum  of  Yj  to  the  minimum  of  Yj  observed  during  the 
bulk  of  the  observation  interval.  Specifically,  we  considered  all  values  of  Y( 
that  occurred  after  the  initial  three  time-constants  of  the  500-ms  observation 
interval.  After  determining  the  maximum  and  minimum  values  of  Yj  that 
occurred  during  that  interval,  we  computed  the  ratio,  R.  The  decision  rule 
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Figure  3.  Temporal  modulation  transfer  function  for  human  subjects  (solid 
symbols)  and  a  three-stage  model  simulation  (open  symbols).  Model  para¬ 
meters  are  W  =  4000  Hz  and  r  =  3  ms. 

assumed  that  the  stimulus  with  the  larger  value  of  R  was  the  signal.  Such  a 
decision  rule  is  somewhat  inefficient  compared  to  the  calculation  of  the 
variance  of  Y;  because  it  is  based  on  only  two  of  the  many  samples  of  Yj 
present  during  an  observation  interval.  We  adopted  this  rule  for  several 
reasons. 

First,  we  wanted  a  decision  statistic  that  would  function  sensibly  even  with 
changes  in  overall  level  of  the  sound,  as  was  true  in  our  gap  detection 
experiment.  The  variance  statistic  would  change  systematically  with  overall 
level,  whereas  the  expected  value  of  R  is  independent  of  the  overall  sound 
level.  Second,  the  R  statistic  produces  the  3-dB-per-octave  slope  observed  in 
psychophysical  data  for  large  modulation  rates,  as  we  shall  now  demonstrate. 

Modulation  detection  data 

Figure  3  shows  the  average  data  of  our  observers,  solid  symbols,  as  well  as 
the  data  from  our  simulation,  open  points.  For  these  simulations,  the  first 
stage  bandwidth  was  4000  Hz  and  the  time  constant  of  the  lowpass  filter  was  3 
ms.  As  can  be  seen,  the  data  appear  to  fall  along  a  3  dB  per  octave  line  at  high 
frequencies.  This  is  somewhat  unexpected,  since  the  final  lowpass  filter  has 
an  attenuation  skirt  of  6  dB  per  octave.  We  believe  that  this  shallow  slope 
arises  because,  as  the  frequency  of  modulation  increases,  a  greater  number  of 
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Figure  4.  Cap  detection  data  for  partially  filled  gaps  in  noise  for  human 
subjects  (solid  symbols)  and  the  three-stage  model  (open  points).  Model 
parameters  are  W  =  4000  Hz  and  r  =  3  ms. 

potential  maxima  and  minima  are  produced.  This  increase  in  the  number  of 
extrema  increases  the  number  of  potential  signals  observed  in  any  observation 
interval  and  ameliorates  the  rapid  fall  in  sensitivity  that  one  would  expect 
from  the  attenuation  produced  by  the  low-pass  filter.  A  problem  with  this 
explanation  is  that  sensitivity  below  the  cut  off  frequency  is  constant  and 
independent  of  modulation  rate  (Fig.  3).  Viemeister’s  model,  which  uses  the 
variance  as  the  decision  statistic,  will  produce  a  6  dB  per  octave  decline  at 
high  frequencies,  if  the  noise  samples  are  equalized  in  overall  power,  as  we 
have  shown  elsewhere  (Forrest  and  Green,  1987).  Viemeister's  original  data 
do  not  show  the  6  dB  per  octave  slope  because  the  noise  was  not  equalized  in 
power  (ibid,  see  Fig.  9).  We  are  now  in  a  position  to  compare  the  computer 
simulations  and  data  obtained  f rom  human  observers  in  the  gap  experiments. 

Gap  detection  data 

Figure  4  shows  the  data  for  the  detection  of  partially  filled  gaps  in  noise. 
The  figure  presents  data  obtained  from  human  observers,  solid  points,  and 
computer  simulations  for  corresponding  conditions,  open  symbols;  again,  the 
parameters  of  the  simulation  were  W  =  4000  Hz  and  r  =  3  ms.  As  can  be  seen, 
the  fit  of  the  model  to  the  data  is  very  satisfactory.  Thus,  a  single  model 
produces  reasonably  good  predictions  of  both  the  gap  (Fig.  4)  and  the 
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modulation  detection  data  (Fig.  3). 

One  of  the  striking  characteristics  of  gap  thresholds  is  their  stability.  This 
stability  arises  in  part  because  of  the  steepness  of  the  psychometric  function. 
For  k  ■  1.0  (complete  cancellation)  the  psychometric  function  for  both  the 
human  observers  and  the  model  has  a  range  of  only  1  ms!  For  smaller  values  of 
k  (0.50  and  0.35)  the  psychometric  functions  are  less  steep  and  human 
detection  performance  is  actually  superior  to  that  obtained  with  the  model. 
We  are  presently  exploring  a  variety  of  different  ideas  on  how  to  alter  the 
computer  simulation  so  that  its  predictions  will  better  mimic  the  human  data. 
The  urgency  of  this  project  will  be  apparent  when  we  consider  how  gap 
detection  thresholds  change  with  the  other  parameters  of  the  model,  namely, 
bandwidth  and  integration  time. 


Gap  detection  as  a  function  of  W  and  t 

Because  estimates  of  gap  thresholds  are  stable,  they  are  often  touted  as  an 
excellent  way  to  assess  temporal  parameters  of  hearing-impaired  listeners. 
Clinical  investigators  have  often  reported  that  gap  thresholds  for  hearing- 
impaired  listeners  are  appreciably  greater  than  those  obtained  from  normal 
listeners.  Gap  thresholds  for  the  hearing-impaired  may  be  in  the  10-  to  20-ms 
range  when  measured  with  k  =  1.0  in  broadband  noise  (Formby,  personal 
communication).  Other  experiments  show  that  gap  thresholds  increase 
systematically,  for  both  normal  and  hearing-impaired  listeners,  as  the 
bandwidth  of  the  noise  is  decreased  (Fitzgibbons  and  Wightman,  1982; 
Fitzgibbons,  1983;  Shailer  and  Moore,  1983;  Buus  and  Florentine,  1985).  The 
gap  thresholds  found  in  these  experiments  are  factors  of  3  to  5  larger  than  the 
typical  gap  threshold  value  of  2-3  ms  found  with  most  normal  observers  in 
broadband  noise.  We  naturally  wondered  if  we  could  alter  the  parameters  of 
the  computer  model  to  produce  data  that  would  simulate  such  large  gap 
thresholds.  The  following  table  shows  how  the  simulated  gap  threshold 
changes  as  the  two  major  parameters  of  the  model  are  altered.  The  columns 

Table  1.  Effect  of  bandwidth  ( IV)  and  tau  (r)  on  simulated  gap  detection 
thresholds.  Entry  is  the  mean  value  of  a  silent  gap  needed  to  achieve  about 
70%  correct  in  an  adaptive  task.  The  standard  deviation  of  these  estimates  is 
about  17%  of  the  mean. 


Bandwidth  (Hz) 

400  800  1600  3200 

Tau(ms) 


1.5 

6.20 

3.92 

2.01 

1.31 

3.0 

6.16 

3.76 

2.28 

1.40 

6.0 

7.57 

4.67 

2.73 

1.95 

12.0 

7.62 

5.21 

3.17 

2.55 
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show  the  variation  in  bandwidth  and  the  rows  are  different  time-constant 
values. 

The  first  thing  to  note  about  the  table  are  the  relatively  small  changes  in 
gap  threshold  caused  by  alteration  of  the  time-constant  value.  A  change  of 
nearly  an  order  of  magnitude  in  the  value  of  the  time-constant  increases  the 
gap  threshold  by  only  a  factor  of  two,  and  then  only  at  the  largest  bandwidth. 
For  the  smaller  values  of  bandwidths,  which  are  needed  to  produce  any 
significant  increase  in  gap  threshold,  the  changes  with  r  are  minuscule.  To 
achieve  gap  thresholds  approaching  the  measured  values  of  10  to  20  ms,  we 
would  need  to  assume  totally  unreasonable  parameter  values  for  the  model. 

We  are  now  exploring  how  the  introduction  of  internal  noise  at  different 
stages  of  the  model  will  alter  this  situation.  At  present,  we  can  only  say  that 
the  model  gives  reasonably  good  predictions  for  normal  hearing  listeners,  but 
is  not  particularly  useful  in  interpreting  the  results  obtained  from  listeners 
with  abnormally  large  gap  thresholds.  Indeed,  changes  in  the  major  temporal 
parameter  of  the  model,  r,  produce  surprisingly  little  variation  in  the  size  of 
the  gap  threshold. 
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Comments 

Patterson: 

It  is  difficult  to  understand  the  motivation  behind  a  model  of  hearing  that 
ignores  cochlear  filtering  and  assumes  that  a  wideband  signal  is  passed 
directly  through  to  the  broad  lowpass  process  that  determines  the  MTF. 
Would  it  not  be  better  to  assume  that  bandwidth  effects  are  the  result  of 
combining  the  outputs  of  sets  of  adjacent  auditory  filters,  and  thereby  make 
the  model  a  lot  more  realistic? 

Reply  by  Green: 

I  have  always  believed  that  the  primary  test  of  a  theory  was  its  ability  to 
predict  the  data,  not  whether  the  assumed  processes  mimicked  our  current 
understanding  of  how  the  system  functions  on  a  more  molecular  level.  Indeed, 
it  seems  to  me  that  the  theory  or  model  must  be  simpler  than  the  process  it 
hopes  to  explain  at  least  in  some  respects,  otherwise,  it  achieves  no  economy 
of  understanding.  The  present  model  has  only  two  free  parameters  (band¬ 
width  and  integration  time)  and  predicts  with  fair  accuracy  the  results  of 
normal-hearing  listeners  in  two  experimental  situations,  see  Fig. 3  and  Fig.4. 
But,  as  Table  1  indicates,  it  does  not  provide  much  understanding  of  hearing- 
impaired  listeners. 

For  that  reason  we  have  been  exploring  a  model  much  like  that  described  in 
your  comment.  That  model,  a  series  of  parallel,  narrow-band  channel,  raises 
the  issue  of  how  the  output  of  these  several  independent  channels  are 
combined.  This  is  not  an  issue  where  more  molecular  investigations  provide 
much  insight.  We  are  presently  exploring  a  number  of  different  decision  rules 
but,  as  yet,  have  nothing  to  report. 


tsed 


Hearing  Research,  32  (1988)  147-164 
Elsevier 


147 


HRR  01032 


Profile  analysis:  Detecting  dynamic  spectral  changes 

David  M.  Green  and  Quang  T.  Nguyen 

Psychoacoustics  Laboratory,  Psychology  Department,  University  of  Florida,  Gainesville,  Florida,  USA 
(Received  27  August  1987;  accepted  17  November  1987) 


This  paper  explores  how  amplitude  modulation  influences  the  detection  of  changes  in  spectral  shape.  We  generally  used  a  complex 
of  21  equal-amplitude  components,  the  lowest  frequency  was  200  Hz,  the  highest  5000  Hz,  with  equal  logarithmic  spacing  between 
components  The  xienal  was  an  increase  in  level  of  one  "r  '""re  components  of  the  complex.  The  overall  level  of  the  sound  varied 
randomly  over  a  20-dB  range.  Three  experiments  are  reported.  In  the  first,  we  determined  how  the  modulation  of  a  single-frequency 
component  influenced  the  detection  of  amplitude  change  at  that  region.  In  the  second  experiment,  the  signal  was  an  alteration  of  the 
entire  spectrum  and  that  alteration  was  subjected  to  various  forms  of  amplitude  modulation.  In  neither  experiment  did  modulation 
generally  increase  the  detectability  of  the  signal.  Finally,  in  the  third  experiment,  we  determined  the  effects  of  modulating  the  ‘signal’ 
and  '  nonsignal'  parts  of  the  spectrum  in  different  relative  phases.  The  results  of  this  experiment  showed  that  the  relative  phase  is 
•  mportant  only  for  modulation  rates  slower  than  about  40  Hz.  For  faster  rates,  the  temporal  structure  of  the  spectrum  is  unimportant. 
Thus,  for  modulation  rates  above  40  Hz,  only  the  power  spectrum  of  the  stimulus  is  critical. 


Psychoacoustics;  Intensity  discrimination;  Amplitude  modulation;  Profile  analysis 


Introduction 

The  salience  of  any  component  of  a  stationary, 
multicomponent  complex  is  greatly  enhanced  by  a 
brief  change  in  practically  any  parameter  of  that 
component  -  amplitude,  phase,  or  frequency.  A 
component  or  set  of  components,  previously  un¬ 
noticed  in  a  stationary  spectrum,  suddenly  be¬ 
comes  prominent  when  those  components  are 
briefly  varied  in  amplitude.  Such  amplitude  vari¬ 
ation,  as  long  as  it  differs  from  the  remainder  of 
the  spectrum,  is  an  effective  way  to  highlight  or 
segregate  a  particular  set  of  components.  Al¬ 
though  the  effects  are  obvious  from  casual  ob¬ 
servation,  there  is  remarkably  little  experimental 
evidence  documenting  these  claims.  Viemeister 
(1980)  has  published  one  of  the  few  systematic 
investigations  of  these  phenomena.  Summerfield 
et  al.  (1987)  have  exploited  this  idea  to  produce 
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vowel-like  spectra.  McAdams  (1984)  has  used  both 
amplitude  and  frequency  modulation  to  produce  a 
number  of  dramatic  demonstrations  that  clearly 
establish  the  saliency  of  temporal  variation  in  a 
multicomponent  complex. 

In  our  recent  work,  we  have  been  exploring  the 
detection  of  amplitude  changes  in  a  multicompo¬ 
nent  complex — wha’  we  call  profile  analysis 
(Green,  1987;  Bernstein  et  al.,  1987).  We  wondered 
how  amplitude  modulation  of  the  altered  compo¬ 
nents  might  affect  the  detection  of  such  changes. 
In  the  first  two  experiments,  we  explore  how  such 
amplitude  variation  influences  the  detection  of 
changes  in  spectral  shape  for  such  multicompo¬ 
nent  complexes.  In  the  first  experiment,  a  single 
component  of  a  21 -component  complex  was 
changed  in  level.  We  wished  to  determine  whether 
amplitude  variation  of  this  component  would  af¬ 
fect  the  detection  of  a  small  change  in  its  average 
amplitude.  In  the  second  experiment,  the  signal 
was  a  more  complex  change  in  the  shape  of  the 
spectrum,  for  example,  alternate  components  were 
increased  or  decreased  in  amplitude.  Again,  we 
wished  to  determine  how  such  amplitude  variation 
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would  influence  the  detection  of  such  spectral 
changes. 

The  third  experiment  explored  another  facet  of 
how  amplitude  variation  of  the  spectrum  may 
affect  the  ability  to  detect  a  change  in  spectral 
shape.  In  most  profile  experiments,  the  overall 
level  of  the  sounds  is  varied  on  each  and  every 
presentation  in  an  effort  to  insure  that  the  primary 
detection  cue  is  a  change  of  relative  level  at  differ¬ 
ent  spectral  loci,  rather  than  a  change  in  absolute 
level  at  some  single-frequency  location.  Suppose 
the  spectral  change  occurs  at  a  single-frequency 
locus.  The  detection  of  such  change  requires  the 
observer  to  compare  the  level  at  the  signal  part  of 
the  spectrum  (where  the  amplitude  change  may 
occur)  with  some  other  nonsignal  part  (where  no 
amplitude  change  can  occur).  We  have  described 
this  comparison  as  a  simultaneous  comparison  of 
level,  to  distinguish  it  from  the  successive  com¬ 
parison  of  level  common  to  many  other  psycho¬ 
physical  tasks.  Suppose  the  signal  and  nonsignal 
parts  of  the  spectrum  are  now  sinusoidally  mod¬ 
ulated  in  amplitude  at  the  same  frequency  but 
with  different  relative  phases.  We  can,  for  exam¬ 
ple,  make  the  signal  and  nonsignal  parts  of  the 
spectrum  wax  and  wane,  either  in-phase  or 
out-of-phase.  How  will  this  relative  phase  in¬ 
fluence  detection  of  the  signal,  and  how  will  the 
threshold  for  in-phase  and  out-of-phase  condi¬ 
tions  vary  with  the  frequency  of  modulation?  These 
are  the  main  questions  of  the  last  set  of  experi¬ 
ments. 

General  methods  and  procedures 

All  stimuli  were  generated  using  an  IBM-XT 
microcomputer  and  a  Data  Translation  DT-2801A 
interface  board  for  D  to  A  conversion.  The  stimuli 
were  all  digitally  computed  and  played  over  the 
12-bit  D  to  A  converter  at  a  sample  rate  of  25  000 
points  per  second.  All  stimuli  were  lowpass 
filtered;  the  filtered  output  was  3  dB  down  at  6000 
Hz,  and  20  dB  at  6750  Hz. 

The  observers  listened  binaurally  to  Sennheiser 
model  HD414SL  earphones;  both  phones  driven 
in-phase.  The  listeners  were  seated  in  sound- 
treated  rooms  and  responded  using  the  computer's 
keyboard.  Events  within  the  trial  cycle,  as  well  as 
feedback  after  each  response,  were  signaled  via 


the  computer’s  monitor.  Three  listeners  served  in 
each  of  the  three  experiments.  One  of  the  authors, 
QN,  participated  in  all  three  experiments.  The 
other  observers  were  students  at  the  University. 
One  of  the  students  observed  in  all  three  experi¬ 
ments.  A  second  observed  in  only  the  first  and 
second  experiments  and  was  replaced  by  a  third 
student  who  observed  only  in  the  third  experi¬ 
ment.  The  observers  listened  for  about  two  hours 
daily.  The  student  observers  were  paid  an  hourly 
rate  plus  a  bonus  upon  the  completion  of  the 
experiment. 

In  all  the  detection  tasks,  the  signal  was  an 
alteration  in  the  spectrum  of  some  multicompo¬ 
nent  signal,  which  we  call  the  ‘standard’.  Typi¬ 
cally,  the  standard  was  a  21-component  complex; 
in  one  experiment  a  7-component  complex  was 
used.  The  components  of  the  standard  were  al¬ 
ways  equal  in  amplitude  and  their  frequencies 
were  spaced  equally  on  a  logarithmic  scale  that 
extended  from  200  Hz-5000  Hz.  A  signal  con¬ 
sisted  of  an  increase  in  the  intensity  of  one  or 
more  components  of  the  standard.  The  sound  was 
presented  for  500  ms.  The  onsets  and  offsets  were 
shaped  by  a  20-ms  cosine-squared  envelope.  In 
different  experiments,  the  signal  component 
and/or  some  components  of  the  standard  were 
amplitude  modulated.  Later,  we  will  describe  these 
dynamic  conditions  in  more  detail  and  will  also 
describe  how  the  signal  was  measured. 

An  adaptive  two-allemative  forced-choice  pro¬ 
cedure  was  used  to  estimate  the  signal  threshold. 
The  adaptive  procedure  (3-down/l-up)  estimates 
a  signal  level  corresponding  to  a  probability  of 
being  correct  equal  to  0.794.  The  initial  step  size 
for  the  signal  was  4  dB  and  was  reduced  to  2  dB 
after  three  reversals.  The  threshold  value  of  the 
signal  was  estimated  from  the  average  of  the  last 
even  number  of  reversals  in  a  50-trial  block,  ex¬ 
cluding  the  first  three  reversals.  An  average  of 
about  11  reversals  was  obtained.  We  report  the 
signal  threshold  as  the  level  of  the  signal  re  the 
amplitude  of  the  component  of  the  standard  to 
which  it  is  added.  Thus,  a  typical  signal  threshold 
of  about  -18  dB  corresponds  to  an  increment  at 
the  signal  component  of  1  dB  re  the  other  compo¬ 
nents  of  the  standard.  At  the  start  of  each  adap¬ 
tive  run,  the  signal  level  was  equal  to  the  level  of 
the  component  of  the  standard  to  which  it  was 
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added,  that  is,  the  component  after  addition  of  the 
signal  was  6  dB  re  the  level  of  the  other  compo¬ 
nents  of  the  standard.  The  overall  intensity  level 
of  the  sound  presented  in  each  interval  was  chosen 
randomly  from  a  rectangular  distribution,  ranging 
from  -10  dB  to  10  dB  re  the  median  level.  This 
procedure  discourages  the  listener  from  using  ab¬ 
solute  intensity  as  a  cue  for  detecting  the  signal. 
The  median  level  of  each  component  of  the  stan¬ 
dard  was  62  dB  SPL;  the  overall  level  of  the 
21-component  standard  was,  therefore,  75  dB  SPL. 

Single-component  signal 

Stimuli 

In  our  first  experiment,  the  signal  to  be  de¬ 
tected  was  an  alteration  in  the  amplitude  of  a 
single  component  of  a  21-component  complex. 
The  frequency  of  the  ‘signal’  component  was 
either:  a  low  frequency  (235  Hz),  the  second  com¬ 
ponent  of  the  complex;  a  medium  frequency  (1000 
Hz),  the  middle  component  of  the  complex;  or  a 
high  frequency  (4257  Hz),  the  penultimate  compo¬ 
nent  of  the  21-component  complc."  We  compare 
the  threshold  for  three  ‘dynamic’  conditions  with 
a  ‘stationary’  condition  in  which  the  signal  com¬ 
ponent  was  constant  in  amplitude  during  the  en¬ 
tire  observation  interval  and  somewhat  larger  in 
amplitude  than  the  amplitude  of  the  20  other 
components.  This  ‘stationary’  condition  was  the 
one  used  in  most  previous  experiments  on  profile 
analysis.  We  know  from  previous  research  (Green 
and  Mason,  1985;  Green  et  al.,  1987)  that  such  a 
change  in  spectral  shape  will  be  most  easily  de¬ 
tected  when  the  ‘middle’  component  of  the  spec¬ 
trum  is  the  signal  component.  In  the  dynamic 
conditions,  three  experimental  manipulations  were 
used  to  assess  how  temporal  variations  at  the 
signal  component  influenced  the  detectability  of 
spectral  alterations  of  the  standard.  All  three  con¬ 
ditions  involved  some  form  of  amplitude  modula¬ 
tion  and  the  frequency  of  modulation,  /m,  was  the 
major  independent  variable,  ranging  from  2-640 
Hz.  The  three  conditions  are  precisely  described  in 
the  appendix,  where  equations  used  to  generate 
the  waveforms  are  presented.  Fig.  1  provides  a 
graphic  presentation  of  the  three  basic  manipula¬ 
tions.  Let  us  describe  the  three  dynamic  condi¬ 
tions  depicted  in  Fig.  1. 


The  three  panels  of  Fig.  1  illustrate  the  three 
experimental  (dynamic)  conditions.  We  used  a 
logarithmic  scale  of  amplitude  and  frequency  in 
order  to  roughly  represent  the  effective  spectra  of 
these  stimuli,  as  processed  in  the  auditory  periph¬ 
ery.  The  standard  21 -component  complex  is  repre¬ 
sented  in  the  panel  to  the  left  and  the  effects  of 
adding  the  signal  to  the  complex  are  represented 
in  the  panel  on  the  right.  To  improve  the  clarity  of 
the  figure,  we  show  only  five  components  of  the 
21 -component  complex:  the  first,  sixth,  eleventh, 
sixteenth,  and  twenty-first.  The  signal  is  always 
represented  as  affecting  the  middle,  or  1 000-1  Iz, 
component.  In  constructing  these  illustrations,  we 
have  selected  a  single  value,  50  Hz,  for  the 
frequency  of  modulation,  the  main  independent 
variable  of  the  dynamic  conditions.  We  also 
selected  a  signal  level  of  - 18  dB  as  a  representa¬ 
tive  threshold  value  for  the  chief  dependent  varia¬ 
ble;  the  signal  is  about  12%  of  the  amplitude  of 
any  other  component  in  the  complex.  When  the 
signal  component  is  added  (in-phase)  to  the  com¬ 
ponent  of  the  standard,  it  makes  that  component 
about  1  dB  larger.  (In  terms  of  the  equations  of 
the  Appendix.  20  log (aj/a)  =  -  18  dB.) 

In  Condition  1,  the  standard  (top  left,  Fig.  1)  is 
a  set  of  21  sinusoidal  components,  all  equal  in 
amplitude.  The  signal  is  sinusoidally  modulated  in 
amplitude  (u,(  1  -  cos(27r/n,r)]  before  being  added 
to  the  corresponding  component  of  the  complex. 
The  resulting  spectrum  (top  right.  Tig.  1)  is  a  set 
of  20,  stationary,  equal-amplitude  components  and 
one  amplitude-modulated  component,  what  we  call 
the  'signal  component’,  /s.  In  the  long-term  ampli¬ 
tude  spectrum,  there  is  a  small  increase  in  level  of 
the  signal  component  as  well  as  the  addition  of 
two  sidebands  located  in  frequency  at  /s  +  fm  and 
fs  —  fm.  The  sidebands  are  low  in  amplitude  (24  dB 
below  the  signal  component).  At  low  rates  of 
modulation,  this  condition  is  virtually  identical  to 
the  stationary  profile  condition,  except  that  the 
amplitude  of  the  signal  component  varies  slightly, 
from  (u  +  2 a,)  to  (a),  at  the  frequency  of  the 
modulation. 

In  Condition  2,  the  standard  (middle  left,  Fig. 
1)  is  a  set  of  21  sinusoidal  components,  all  equal 
in  amplitude,  but  the  signal  component,  /s,  is  also 
100%  amplitude  modulated  [1  -  cos(2ir/mt)].  This 
pioduces  two  sidebands  at  frequencies  /4  +  fm  am 
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Fig.  1.  Schematic  of  the  spectra  used  in  the  three  dynamic  conditions  of  the  single-component-signal  experiment.  Only  five 
components  of  the  21-component  spectra  are  represented.  The  left  panels  are  the  spectra  of  the  standard  alone.  The  right  panels  are 
the  spectra  of  the  signal-plus-standard.  The  amplitude  modulation  condition  is  represented  by  sidebands  near  the  central  component. 
The  signal  amplitudes  depicted  are  typical  of  those  measured  in  the  experiment. 


f%  —fm.  The  level  of  the  sidebands  is  6  dB  less  than 
the  level  of  the  component  which  is  modulated. 
For  this  condition,  the  signal  is  also  amplitude 
modulated  [0,(1  -  cos(27r/m/)]  before  being  added 
to  the  corresponding  component  of  the  complex 
(see  middle  right,  Fig.  1).  At  low  rates  of  modula¬ 
tion,  this  is  similar  to  the  stationary  profile  condi¬ 
tion,  except  that  the  amplitude  of  the  signal  com¬ 
ponent,  varies  considerably,  from  (2 a  +  2 a .)  to 
(0),  at  the  frequency  of  modulation.  In  the  long¬ 


term  amplitude  spectrum,  the  signal  produces  a 
small  increase  in  the  amplitude  of  the  signal  com¬ 
ponent,  as  well  as  two  relatively  large  sidebands 
located  symmetrically  around  that  component. 

In  Condition  3,  the  standard  (bottom  left,  Fig. 
1)  is  again  a  set  of  21  sinusoidal  components,  all 
equal  in  amplitude.  For  this  condition,  the  signal 
is  multiplied  by  a  sinusoidal  component 
[a,  cos(2ff/mf)]  (so-called  ‘suppressed  carrier 
modulation’)  before  being  added  to  the  corre- 


sponding  component  of  the  complex.  The  result¬ 
ing  spectrum  (bottom  right.  Fig.  1)  is  similar  to 
the  standard  spectrum,  but  with  two  small  side¬ 
bands  located  at  frequencies  fs+fm  and 
At  low  rates  of  modulation,  the  signal  component 
waxes  (a  +  a/)  and  wanes  (a  —  a,)  and  has  an 
average  amplitude  of  a,  equal  to  that  of  the  signal 
component  of  the  standard  spectrum.  The  side¬ 
bands  are  low  in  amplitude  (24  dB  down  from  the 
average  amplitude  of  the  signal  component).  The 
long-term  amplitude  spectrum  is  the  same  as  the 
flat  spectra  of  the  standard,  except  for  the  pres¬ 
ence  of  some  slight  energy  in  the  two  sidebands 
located  symmetrically  around  the  signal  frequency. 

Results  and  discussion 

The  results  are  presented  in  Fig.  2.  For  each 
experimental  condition,  the  threshold  for  the  sig¬ 
nal  was  determined  at  nine  different  modulation 
frequencies  ranging  from  2  f  ir'  H7.  The  signal 
threshold  is  plotted  as  a  function  of  modulation 
frequency  for  each  of  three  signal  frequencies  in 
the  separate  panels  of  the  figure.  The  threshold 
reported  is  the  average  over  three  observers.  Al¬ 
though  the  observers  often  differ  from  each  other, 
the  major  trends  are  well  represented  in  the  aver¬ 
age  data.  The  error  bars  represent  the  standard 
error  of  the  mean  threshold  (twelve  threshold  de¬ 
terminations  from  each  of  the  three  observers). 

The  threshold  of  the  signal  is  expressed  as  20 
log  Oj/a  (see  Appendix).  In  the  modulation  con¬ 
ditions,  the  amplitude  of  the  signal  component 
will  wax  and  wane,  but  no  attempt  has  been  made 
to  calculate  an  ‘equivalent’  signal  value.  The  three 
different  modulation  conditions  are  coded  in  the 
figure:  a  square,  a  triangle,  and  a  circle  represent 
the  thresholds  obtained  for  Conditions  1,  2,  and  3, 
respectively.  The  solid  horizontal  line  represents 
the  threshold  obtained  in  the  stationary  profile 
condition,  and  its  value  is  indicated  below  the  line. 
In  this  stationary  condition,  the  signal  component 
is  a  when  the  standard  is  presented  and  a  +  a, 
when  signal-plus-standard  is  present  (see  appen¬ 
dix).  Because  the  overall  level  of  the  spectrum 
randomly  varies  over  a  20-dB  range,  the  observer 
must  listen  for  some  change  in  the  shape  of  the 
spectrum,  either  dynamic  or  steady  state,  rather 
than  any  change  in  absolute  amplitude. 


Before  beginning  the  discussion  of  the  results, 
one  should  recall  that  the  sounds  were  presented 
for  a  duration  of  one-half  second.  Thus,  for  the 
low  modulation  frequencies,  only  a  few  cycles  of 
the  modulation  were  presented.  The  relatively  short 
observation  interval  probably  inflates  some  of  the 
threshold  values  for  these  lower  modulation  rates. 

The  first  general  observation  is  that  the  dy¬ 
namic  cues  do  not  greatly  improve  the  detectabil¬ 
ity  of  the  signal;  in  fact,  they  make  the  detection 
of  a  spectral  change  more  difficult.  Practically  all 
the  data  points  from  the  various  dynamic  condi¬ 
tions  lie  above  the  horizontal  line  which  represents 
the  average  threshold  for  the  stationary  condition. 
Consider  in  particular  Condition  2.  At  least  for 
the  lower  modulation  rates,  the  signal  component 
is  very  salient  because  it  is  always  100%  amplitude 
modulated  (both  in  the  standard  and  signal-plus- 
standard).  This  amplitude  fluctuation  makes  the 
signal  frequency  clearly  evident  in  listening  to  the 
multicomponent  complexes.  Despite  this  increased 
saliency,  the  spectral  alteration  is  generally  harder 
to  hear  when  it  varies  in  time  than  when  presented 
at  a  fixed  level. 

We  should  qualify  this  observation  by  noting 
that  the  relation  between  the  threshold  obtained  in 
the  dynamic  conditions  relative  to  the  stationary 
threshold  appears  to  depend  on  signal  frequency. 
At  the  low  and  middle  signal  frequencies,  the 
dynamic  cues  are  largely  detrimental.  At  the 
highest  signal  frequency,  they  do  not  notably  im¬ 
pair  the  detection  of  the  signal  and  may  slightly 
aid  detectability.  In  fact,  for  the  4257-Hz  signal 
condition,  temporal  variation  in  signal  amplitude 
produces  somewhat  lower  thresholds  (Conditions 
1  and  3),  at  least  for  the  slower  and  moderate 
rates  of  modulation.  This  result  was  observed  for 
all  three  listeners  and  represents  the  only  condi¬ 
tions  where  the  dynamic  presentation  made  the 
change  in  spectral  shape  easier  to  hear  than  the 
simple  steady-state  condition.  These  conclusions 
obviously  depend  on  how  the  signal  is  measured. 

One  might  argue  that  the  dynamic  thresholds 
should  be  ‘corrected’  by  the  change  in  pow^r  of 
the  signal  caused  by  the  modulations,  rather  than 
simply  20Iog( o,/a).  If  this  procedure  is  followed, 
then  Conditions  1  and  2  (amplitude  modulation) 
should  be  increased  by  1.7  dB  [20  log(l  +  j)]. 
Condition  3  (suppressed  carrier  modulation)  de- 
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Fig.  2.  Results  of  the  single-component-signal  experiment.  The  ordinate  is  the  threshold  for  the  signal,  201og(  a ; /a)  (see  Appendix). 
The  abscissa  is  the  frequency  of  modulation,  /m.  The  three  signal  frequencies  are  plotted  in  separate  panels.  The  data  for  the  various 
dynamic  conditions  arc  coded:  square-Condition  1,  triangle-Condition  2,  circle-Condition  3.  The  horizontal  line  (and  number)  in 
each  panel  is  the  threshold  for  the  stationary  profile  condition.  Each  data  point  is  the  average  threshold  for  the  three  observers. 


creases  the  signal  level  by  3  dB  (20  logy).  Thus, 
one  can  mentally  increase  the  thresholds  of  Condi¬ 
tions  1  and  2  by  1,7  dB  and  decrease  the  threshold 
determined  in  Condition  3  by  3  dB.  If  this  ‘correc¬ 
tion’  is  made,  then  Condition  3  is  slightly  better 
than  the  steady-state  condition  at  the  highest  sig¬ 
nal  frequency,  about  the  same  at  1000  Hz,  and 
remains  much  poorer  at  the  lowest  frequency. 


Condition  2,  when  the  data  are  translated  upward, 
would  never  produce  a  threshold  that  is  lower 
than  that  obtained  in  the  stationary  condition. 
Condition  1,  when  corrected  upward,  would  al¬ 
ways  be  worse  at  the  lowest  signal  frequency, 
mostly  worse  at  the  middle  signal  frequency,  and 
equal  to  or  slightly  better  than  the  stationary 
condition  at  the  highest  signal  frequency.  Unfor- 


tunately,  until  we  learn  more,  there  is  no  way  to 
know  the  best  way  to  measure  the  stimulus. 

We  should  comment  on  the  threshold  levels 
obtained  in  Condition  3.  In  this  condition,  the 
task  is  to  detect  amplitude  modulation  at  the 
signal  component,  with  a  number  of  other  non- 
modulated  components  present  in  the  spectrum.  If 
the  nonmodulated  components  were  not  present, 
then  this  task  would  be  equivalent  to  detecting 
amplitude  modulation  of  a  single  sinusoid.  For 
this  condition,  our  measure  of  threshold, 
201og(  a  ./a ),  is  equal  to  201og(w),  where  m  is 
the  degree  of  amplitude  modulation  -  {(1  + 
mcos(2ir/m/))cos(27r/<:/)}.  Zwicker  (1959)  has  mea¬ 
sured  the  threshold  for  the  detection  of  amplitude 
modulation  with  a  single  sinusoid.  He  finds,  at 
low  modulation  rates,  a  threshold  of  about  —25 
dB  for  a  low  frequency  carrier,  about  —  30  dB  for 
a  1000-Hz  carrier,  and  about  -  35  dB  for  a  high- 
frequency  carrier.  For  all  three  carrier  frequencies, 
he  finds  that  the  thresholds  increase  about  10  dB 
as  the  rate  of  modulation  increases  from  2-80  Hz. 
If  our  observers  are  as  sensitive  to  modulation  as 
Zwicker’s,  then  one  is  forced  to  conclude  that  the 
nonmodulated  components  of  Condition  3  ex¬ 
ercise  a  considerable  amount  of  masking.  There¬ 
fore,  in  a  separate  experiment,  we  measured  the 
threshold  of  our  listeners  for  detecting  amplitude 
modulation  of  a  single-frequency  component  in 
isolation.  If  the  sounds  are  gated  for  one-half 
second  durations,  the  thresholds  are  about  5-10 
dB  lower  than  those  found  in  Condition  3.  If  the 
tone  is  continuously  present,  as  it  was  in  Zwicker’s 
study,  then  the  thresholds  for  detecting  a  half-sec¬ 
ond  of  modulation  are  another  5-10  dB  lower, 
depending  on  carrier  frequency  and  modulation 
rate.  At  the  lowest  modulation  rates,  our  listeners 
are  also  less  sensitive  than  Zwicker’s,  by  between  2 
and  7  dB,  the  largest  discrepancies  being  for  the 
1000  Hz  carrier.  In  short,  some  of  the  differences 
between  Condition  3  and  existing  data  on  the 
detection  of  amplitude  modulation  of  single 
sinusoids  are  due  to  the  presence  of  the  other, 
nonmodulated,  components.  But  a  larger  part  of 
the  difference  is  due  to  individual  differences  in 
the  observers  and  the  mode  of  stimulus  presenta¬ 
tion — gated  versus  continuous  listening.  We  are 
presently  studying  these  differences  in  greater  de¬ 
tail. 


We  can  also  compare  the  results  obtained  in 
the  different  conditions  in  an  attempt  to  infer  the 
most  effective  cues  to  the  presence  of  the  signal. 
In  the  stationary  profile  condition,  only  the  ampli¬ 
tude  of  the  signal  components  is  different  from 
the  amplitude  of  the  other  components  of  the 
multicomponent  standard.  Let  us  call  this  a  ‘ rela¬ 
tive-level’  cue.  The  detectability  of  this  cue  is  of 
central  concern  for  most  profile-analysis  experi¬ 
ments.  Condition  1  adds  a  dynamic  component  to 
this  simple  situation,  because  both  the  modulation 
of  the  signal  and  the  relative-level  cue  are  present. 
Condition  2  makes  the  dynamic  cue  less  im¬ 
portant,  since  amplitude  modulation  is  present  in 
the  standard  stimulus,  as  well  as  the  standard- 
plus-signal.  Condition  3,  at  least  at  the  higher 
rates  of  modulation,  makes  the  relative-level  cue 
unimportant,  since,  at  the  highest  rates,  the  signal 
component  is  equal  in  amplitude  to  all  other  com¬ 
ponents  of  the  complex. 

Comparison  of  these  conditions,  however,  does 
not  reveal  any  general  rules  about  the  relative 
effectiveness  of  different  cues,  at  least  for  all  three 
signal  frequencies.  Compared  to  the  stationary 
profile  condition,  simple  amplitude  modulation  of 
the  signal  (Condition  1)  does  not  greatly  change 
the  detectability  of  the  signal.  Condition  2,  which 
reduces  the  importance  of  the  modulation  cue, 
raises  the  signal  threshold  considerably  for  the 
middle  frequency  signal  (1000  Hz),  but  does  not 
greatly  affect  the  highest  and  lowest  signal 
frequency.  Condition  3,  which  emphasizes  the 
purely  temporal  cue,  at  low  rates  of  modulation,  is 
generally  ineffective  at  the  lowest  and  middle  sig¬ 
nal  frequency.  At  the  iiiglicst  modulation  rates, 
Condition  3  generally  becomes  ineffective.  In  those 
conditions  where  this  is  not  true,  for  example  the 
640-Hz  modulation  at  the  4257-Hz  signal 
frequency,  detection  of  the  sidebands  probably 
has  occurred. 

Multiple-component  signals 

Stimuli 

In  our  second  experiment,  we  altered  the  ampli¬ 
tude  of  many  components  of  the  21-component 
complex.  Once  again  a  stationary  profile  condition 
was  compared  with  two  dynamic  conditions.  In 
this  stationary  condition,  alternate  components  of 
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Fig.  3.  Schematic  of  the  spectra  used  in  the  mulliple-component-signal  experiment.  Only  five  components  of  the  21-component 
spectra  are  represented.  The  left  panels  are  the  spectra  of  the  standard  alone.  The  right  panels  are  the  spectra  of  the  signal-plus-stan¬ 
dard.  The  amplitude  modulation  condition  is  represented  by  sidebands  near  the  central  component.  The  arrows  on  the  sidebands 
represent  the  phase  of  the  signal  modulation  re  the  standard  modulation  (see  Appendix).  The  signal  amplitudes  indicated  are  typu.  ' 

of  those  found  in  the  experiment. 


the  spectrum  were  increased  or  decreased  in  am¬ 
plitude.  Thus,  the  listener’s  task  was  to  dis¬ 
criminate  a  flat  multicomponent  spectrum  from  a 
‘serrated’  spectrum,  one  in  which  the  amplitudes 
of  successive  components  were  alternately  higher 
and  lower  than  the  average.  We  know  from  previ¬ 
ous  research  that  listeners  can  detect  such  alterna¬ 
tions  over  the  entire  spectrum  more  easily  than 
when  the  same  amplitude  change  occurs  at  only  a 


single  component.  In  the  dynamic  conditions,  two 
experimental  manipulations  were  employed.  The 
two  conditions  are  precisely  described  in  the  Ap¬ 
pendix,  where  equations  used  to  generate  the 
waveforms  are  presented.  Fig.  3  provides  a  graphic 
presentation  of  the  stationary  condition  and  the 
two  dynamic  stimuli. 

Once  more,  we  use  a  logarithmic  scale  of  am¬ 
plitude  and  frequency.  The  standard  21-compo- 


nenl  complex  is  represented  in  the  panel  to  the 
left  and  the  effects  of  adding  the  signal  to  the 
complex  are  represented  in  the  panel  on  the  right. 
Once  more,  only  five  components  of  the  full  21- 
component  complex  are  represented:  the  first, 
sixth,  eleventh,  sixteenth,  and  twenty-first  compo¬ 
nents.  The  value  of  frequency  of  modulation  is 
chosen  to  be  50  Hz,  and  the  threshold  value  of  the 
signal  -  18  dB. 

In  Condition  4,  the  stationary  condition,  the 
standard  (top  left.  Fig.  3)  is  a  set  of  21  sinusoidal 
components,  all  equal  in  amplitude.  The  addition 
of  the  signal  to  the  standard  causes  the  alternate 
components  of  the  standard  to  be  increased  and 
decreased  in  level  (top  right,  Fig.  3).  Thus,  the 
listener  must  discriminate  a  flat  from  an  alternat¬ 
ing  spectrum.  No  amplitude  modulation  of  either 
the  standard  or  signal  is  present  in  this  condition. 

In  Condition  5,  the  standard  (middle  left,  Fig. 
3)  is  a  set  of  21  sinusoidal  components,  all  equal 
in  amplitude.  For  this  condition,  the  signal  is 
simply  an  amplitude  modulation  version  of  the 
standard  [A(l  +  cos(2 irfmt)\.  Adding  that  signal  to 
the  standard  (in-phase)  produces  the  resulting 
spectrum  (middle  right,  Fig.  3).  The  result  is  the 
standard  spectrum  with  a  slight  amplitude  mod¬ 
ulation  of  all  components.  In  the  long-term  spec¬ 
trum,  there  is  a  small  increase  in  the  amplitude  of 
all  components  and  two  sidebands  located  about 
each  component  frequency,  plus  and  minus  the 
modulation  frequency,  fm.  The  sidebands  are  low 
in  amplitude  ( -  24  dB  re  the  carrier).  At  low  rates 
of  modulation,  this  condition  is  similar  to  the 
stationary  profile  condition,  except  that  the  am¬ 
plitude  of  all  components  varies  from  (a  +  2k)  to 
(a),  at  the  frequency  of  modulation.  Note  that  the 
components  of  the  complex  are  spaced  at  equal 
distances  on  a  logarithmic  frequency  scale  and  the 
sidebands  are  a  constant  linear  distance  from  these 
components.  Thus,  the  relative  logarithmic  sep¬ 
aration  of  the  sidebands  changes  systematically 
with  frequency  as  illustrated  in  the  figure. 

In  Condition  6,  the  standard  (bottom  left.  Fig. 
3)  is  a  set  of  21  sinusoidal  components,  all  equal 
in  amplitude.  For  this  condition,  the  signal  is 
again  an  amplitude-modulated  version  of  the 
standard.  However,  the  alternate  components  of 
this  complex  are  modulated  in  different  phases. 
Even  components  are  modulated  by  [A(l  + 


cos(2w/mr)],  whereas  odd  components  are  mod¬ 
ulated  by  (A(l  —  cos(27r/mr)].  The  resulting  long¬ 
term  spectrum  (bottom  right,  Fig.  3)  is  a  small 
increase  in  the  amplitude  of  the  signal  component 
and  two  sidebands  for  each  component  that  alter¬ 
nate  in-phase  (shown  by  the  arrows  pointing  up  or 
down  for  alternate  components).  At  low  rates  of 
modulation,  the  21  components  of  the  complex 
w-ax  (a  +  2k)  and  wane  (a)  in  amplitude,  with 
alternate  components  out-of-phase. 

Results  and  discussion 

The  results  are  presented  in  Fig.  4.  For  each 
experimental  condition,  the  threshold  for  the  sig¬ 
nal  was  determined  at  nine  different  modulation 
frequencies,  ranging  from  2-640  Hz.  The  signal 
threshold,  average  over  three  observers,  is  plotted 
as  a  function  of  modulation  frequency  in  the 
separate  panels  of  the  figure.  Again,  the  major 
trends  are  well  represented  in  the  average  data. 
The  error  bars  represent  the  standard  error  of  the 
mean  threshold  (twelve  threshold  determinations 
from  each  of  the  three  observers). 

The  threshold  of  the  signal  is  expressed  as  the 
level  of  the  amplitude  of  the  signal  component,  k 
(see  Appendix),  re  the  level  of  the  standard  com- 
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Fig.  4.  Results  of  the  multiple-component-signal  experiment. 
The  ordinate  is  the  threshold  for  the  signal,  201og {k/a)  (see 
Appendix).  The  abscissa  is  the  frequency  of  modulation,  /m. 
The  data  for  the  two  experimental  conditions  are  coded: 
square-Condition  5,  (riangle-Condition  6.  The  horizontal  line 
(and  number)  is  the  threshold  for  the  stationary  profile  condi¬ 
tion.  Each  data  point  is  the  average  threshold  for  the  three 
observers. 
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ponent  to  which  it  is  added  in-phase  (201og  k/a). 
We  have  not  corrected  this  threshold  value  to  take 
account  of  the  change  in  signal  power  created  by 
the  modulation.  One  should  be  wary  in  comparing 
the  thresholds  for  the  multiple  signal  conditions 
with  those  of  the  single  component  conditions 
reported  in  the  preceding  experiment.  In  this  mul¬ 
tiple  signal  experiment,  the  same  ‘signal’  ampli¬ 
tude,  k,  has  been  used  to  alter  21  components, 
rather  than  a  single  component,  ay  Thus,  there 
are,  in  effect,  a  total  of  21  signals  rather  than  the 
single  component  of  the  former  experiment.  If  we 
measured  the  total  signal  energy  or  power,  it  would 
be  13  dB  greater  for  the  21-component  signal,  but 
no  attempt  has  been  made  to  ‘equalize’  the  signal 
threshold.  Indeed,  there  is  still  some  doubt  as  to 
how  to  accomplish  that  objective  (Green,  1986; 
Green  et  al.,  1987;  Bernstein  and  Green,  1987). 

The  two  modulation  conditions  are  coded  in 
the  figure:  a  square  and  a  triangle  represent  the 
thresholds  obtained  for  Conditions  5  and  6,  re¬ 
spectively.  The  solid  horizontal  line  represents  the 
threshold  obtained  in  the  stationary  profile  condi¬ 
tion  (Condition  4).  Once  more,  we  remind  the 
reader  that  the  sounds  were  presented  for  a  dura¬ 
tion  of  one-half  second.  This  brief  duration  may 
inflate  the  threshold  values  for  the  slower  modula¬ 
tion  rates. 

Again,  the  results  show  that  a  change  in  spec¬ 
tral  shape,  presented  in  a  dynamic  mode,  does  not 
improve  the  ability  to  detect  a  change  in  spectral 
shape  over  the  same  amount  of  change  presented 
in  a  stationary  spectrum.  Generally,  the  thresholds 
are  best  for  the  stationary,  saw-tooth  condition, 
represented  in  the  figure  by  the  solid  horizontal 
line  (-26.5  dB).  If  one  increases  the  signal  power 
created  by  the  amplitude  modulation  of  the  signal 
(1 .7  dB),  thus  elevating  all  the  threshold  points  by 
1.7  dB,  the  discrepancy  widens.  The  dynamic  con¬ 
ditions  produce  higher  signal  thresholds,  even  if  a 
‘corrected’  threshold  quantity  is  calculated. 

As  the  modulation  rate  increases,  the  spectral 
changes  become  increasingly  difficult  to  hear.  In 
the  first  experiment,  at  the  higher  rates  of  mod¬ 
ulation,  audible  sidebands  probably  did  occur.  In 
these  experiments,  at  these  high  rates  of  mod¬ 
ulation,  the  sidebands  were  largely  inaudible  since 
they  moved  into  adjacent  masking  components. 
Thus,  as  the  frequency  of  modulation  increases. 


the  shape  of  the  long-term  power  spectrum  of 
standard  and  signal-plus-standard  is  essentially 
the  same,  and  the  threshold  for  the  signal  in¬ 
creases.  That  the  sidebands  were  of  some  impor¬ 
tance  is  demonstrated  at  the  highest  modulation 
frequencies.  Detection  of  the  sidebands  is  prob¬ 
ably  the  primary  reason  that  the  signal  can  be 
heard  in  Condition  5,  since  the  spectrum  is  essen¬ 
tially  flat  once  the  temporal  variation  within  a 
channel  is  lost.  The  phase  of  the  relative  modula¬ 
tion  appears  to  lose  importance  at  a  relatively  low 
modulation  frequency,  20-40  Hz,  since  at  that 
frequency  Conditions  5  and  6  produce  very  simi¬ 
lar  thresholds.  The  different  threshold  estimates 
seen  for  these  conditions  at  modulation  rates  of 
160  and  320  Hz  probably  reflect  the  different 
interactions  of  sidebands  of  similar  frequency  but 
opposite  phase.  That  temporal  variation  in  the 
spectrum  is  effective  only  at  relatively  low  rates  of 
modulation  (below  40  Hz)  is  a  conclusion  also 
suggested  by  the  next  experiment. 

Time-varying  signal  and  nonsignal 

Stimuli 

In  this  third  experiment,  we  explore  a  slightly 
different  aspect  of  the  question  of  how  temporal 
variation  in  the  spectrum  influences  profile  analy¬ 
sis.  Here  we  concentrate  on  the  relative  coherence 
of  modulation  in  what  we  call  the  ‘signal’  and 
‘nonsignal’  part  of  the  spectrum.  Detecting  a 
change  in  spectral  shape,  if  the  change  occurs  at  a 
single  spectral  locus,  requires  a  simultaneous  com¬ 
parison  of  intensity  information  across  different 
frequency  channels.  An  obvious  question  is,  how 
does  temporal  variation  within  the  signal  and  non¬ 
signal  channels  influence  such  comparisons?  Note 
that  the  nonsignal  part  of  the  spectrum  can  be 
considered  as  a  kind  of  amplitude  standard,  against 
which  comparisons  of  the  amplitude  in  the  signal 
channels  can  be  made.  If  both  signal  and  nonsig¬ 
nal  channels  vary  coherently,  then  detection  of  a 
change  in  the  relative  level  of  the  signal  channel 
should  proceed  smoothly.  But  suppose  the  nonsig- 
nal  part  of  the  spectrum  varies  out-of-phase  with 
the  signal  part.  How  will  this  lack  of  amplitude 
coherence  influence  the  detectability  of  a  change? 
The  specific  question  we  asked  was,  suppose  the 
signal  and  nonsignal  levels  are  both  amplitude 


157 


modulated,  will  the  relative  phase  between  the 
envelopes  of  the  two  modulation  waveforms  in¬ 
fluence  detection  performance? 

In  these  experiments,  the  standard  is  our  usual 
multicomponent  complex.  In  one  experiment,  it 
contained  21  components,  in  the  other  7  compo¬ 
nents.  The  signal  was  a  single  component  of  the 
standard  that  was  added  in-phase  to  the  standard, 
thus  producing  a  small  increment  in  that  compo¬ 
nent  of  the  standard.  We  call  the  component  at 
the  signal  frequency  the  ‘signal  component.’  The 
amplitude  of  this  component  will  have  the  value  a 
if  standard  alone  is  present  and  the  value  a  +  a,  if 
the  signal  is  added  to  the  standard.  The  other 
components  (20  or  6  in  number)  we  call  the 
‘nonsignal  components.’  They  all  have  amplitude 
a ,  independent  of  whether  or  not  the  signal  is 
present.  We  now  multiply  each  of  these  two  wave¬ 
forms,  the  signal  component  and  the  nonsignal 
components,  by  a  modulation  waveform,  m(t), 
where 

m(t)  -  1  +  sin[2 tr/j  +  0(j)] 

The  rate  of  modulation,  fm,  is  in  cycles  per 
second.  The  phase  of  the  modulation,  0,  depends 
on  the  waveform  being  modulated,  s.  Suppose 
both  signal  and  nonsignal  components  are  mod¬ 
ulated,  using  the  same  value  of  theta.  We  call  this 
the  ‘in-phase’  modulation  condition.  In  that  case, 
the  level  of  the  signal  and  nonsignal  components 
wax  and  wane  together.  Likewise  we  can  choose 
different  values  of  thetas  for  the  signal  and  the 
nonsignal  components.  A  phase  difference  of  180° 
between  the  two  thetas  is  referred  to  as  the  ‘out- 
of-phase’  condition.  In  that  case,  the  signal  ampli¬ 
tude  increases  while  the  amplitude  of  the  nonsig¬ 
nal  components  is  decreasing  and  vice  versa. 

Because  of  the  randomization  of  presentation 
level,  listening  only  for  overall  intensity  at  any 
component  is  a  poor  detection  strategy.  To  obtain 
good  detection  performance,  one  must  compare 
the  level  of  the  signal  component  with  the  nonsig¬ 
nal  component  on  each  particular  presentation, 
although  the  levels  will  vary  at  the  modulation 
rate  during  each  stimulus  presentation.  The  task  is 
always  the  same:  to  determine  whether  there  is  a 
relative  increment  in  a  single  component  of  an 
otherwise  flat  spectrum. 


The  relative  deiectability  of  an  increment  at 
three  different  frequency  locations  was  measured 
using  both  the  21-  and  7-component  standards. 
The  signal  frequencies  were  the  second,  middle, 
and  penultimate  components  of  the 
complex — either  235,  1000,  or  4257  Hz  for  the 
21-component  complex  or  342,  1000,  or  2924  Hz 
for  the  7-component  complex.  The  rate  of  mod¬ 
ulation  ranged  between  2  and  160  Hz. 

Results  and  discussion 

Fig.  5  presents  the  results  for  the  21 -component 
standard;  Fig.  6  presents  the  results  for  the  7-com- 
ponent  standard.  The  threshold  for  the  signal  is 
plotted  along  the  ordinate,  and  the  rate  of  mod¬ 
ulation,  /m,  is  plotted  along  the  abscissa.  The 
threshold  values  arc  averages  across  three  ob¬ 
servers;  two  of  the  three  had  participated  in  the 
previous  experiments,  the  third  subject  listened 
only  to  these  conditions.  The  square  symbols  code 
the  in-phase  condition,  and  the  triangles  code  the 
out-of-phase  condition.  The  solid  horizontal  line 
in  the  figure  represents  the  threshold  for  the  signal 
in  a  stationary  profile  condition,  the  signal  is 
unmodulated  and  simply  increases  the  amplitude 
of  the  signal  component  to  a  +  during  the 
entire  observation  interval.  The  slight  differences 
in  average  threshold  value  from  those  reported  in 
the  previous  experiment  arise  for  two  reasons. 
First,  one  listener  is  different.  Second,  these  two 
measurements  were  taken  several  months  apart; 
thus,  the  observers  common  to  both  measurements 
have  had  more  experience  in  the  task,  and  their 
thresholds  had  improved  slightly.  One  subject  im¬ 
proved  an  average  of  about  4  dB,  the  other  2.5  dB. 
Such  long-term  improvement  is  characteristic  of 
profile  experiments  (Kidd  et  al.,  1987). 

The  general  form  of  the  results  is  sensible.  At 
very  slow  rates  of  modulation,  for  example  2  Hz, 
it  is  difficult  to  detect  the  signal  in  the  out-of-phase 
condition.  We  presume  this  occurs  because  the 
nonsignal  components  provide  little  basis  for  a 
simultaneous  comparison  of  signal  and  nonsignal 
levels;  the  nonsig.ial  components  are  nearly  ab¬ 
sent  when  the  signal  component  is  in  the  vicinity 
of  a  maximum,  and  the  reverse.  Without  a  simul¬ 
taneous  level  cue,  the  observer  is  forced  to  listen 
to  overall  level  which  is  a  very  poor  cue,  given  the 
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FREQUENCY  OF  MODULATION  (Hz) 

Fig.  5.  Results  for  the  experiment,  using  a  21-component  complex,  on  the  relative  phase  of  signal  and  nonsignal  components.  The 
ordinate  and  abscissa  are  the  same  as  those  used  in  Figs.  2  and  4.  The  threshold  values,  when  signal  and  nonsignal  components  are 
in-phase  (square)  and  out-of-phase  (triangle),  are  plotted.  The  signal  frequencies  are  indicated.  The  horizontal  line  (and  number)  in 
each  panel  is  the  threshold  for  the  stationary  profile  condition.  Each  data  point  is  the  average  threshold  for  the  three  observers. 


presentation  level  is  selected  at  random  from  20-dB 
range.  If  overall  level  is  the  only  cue,  then,  given  a 
20-dB  range,  one  can  calculate  that  the  threshold 
will  increase  to  about  +3  dB  using  the  (3-down, 
1-up)  adaptive  rule  (Green,  1987,  p.  20). 

For  the  in-phase  condition,  the  signal  and  non¬ 
signal  components  wax  and  wane  together,  and, 
hence,  a  simultaneous  comparison  of  levels  is  pos¬ 


sible.  As  a  general  rule,  the  threshold  for  the 
in-phase  condition  is  nearly  the  same  as  what  one 
obtains  with  the  stationary  (unmodulated)  condi¬ 
tion,  independent  of  the  rate  of  modulation.  The 
largest  exception  seems  to  be  the  lowest  frequency 
signal  (235  Hz)  for  the  21 -component  profile 
where,  for  modulation  frequencies  less  than  20  Hz, 
all  the  in-phase  thresholds  seem  to  be  about  5  dB 
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Fig.  6.  Same  as  Fig.  5  except  a  7-component  complex  was  used. 


poorer  than  the  unmodulated  condition. 

In  terms  of  detecting  a  spectral  change,  only 
when  the  sidebands  produced  by  the  modulation 
interact  with  the  major  components  of  the  spec¬ 
trum  does  there  appear  to  be  a  difference  in  the 
threshold  for  the  unmodulated  and  modulated 
conditions.  When  this  interaction  occurs,  the  spec¬ 
trum  sounds  ‘rough’.  Depending  on  the  rate  of 
modulation,  the  signal  frequency,  and  the  density 
of  the  components  in  the  spectrum,  this  roughness 


appears  to  cause  a  small  elevation  in  threshold  for 
the  in-phase  conditions.  For  example,  in  Fig.  5, 
the  small  bump  in  the  data  for  the  235-Hz  signal 
frequency  (21-component)  condition  at  a  frequen¬ 
cy  of  modulation  of  40  Hz  is  where  the  sidebands 
of  the  first  (200  Hz)  and  t.iird  (276  Hz)  spectral 
components  are  very  close  in  frequency  to  the 
signal  (second)  component  (235  Hz).  Similarly,  the 
upturn  in  the  data  for  the  1000  Hz  signal  frequency 
(21 -component)  condition  at  a  modulation 
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frequency  of  160  Hz  is  where  the  sidebands  of  the 
nearest  components  fall  close  to  the  signal 
frequency.  Indeed,  the  general  lack  of  such  effects 
in  the  7-component  data  is  consistent  with  this 
explanation.  For  the  data  of  the  7-component 
complex  (Fig.  6),  there  is  little  difference  between 
the  threshold  measured  in  the  in-phase  condition 
and  the  stationary  condition  at  any  modulation 
frequency.  The  only  departure  from  this  rule  oc¬ 
curs  at  the  lowest  signal  frequency  (342  Hz)  where 
the  data  appear  to  drift  upward  for  the  highest 
rates  of  modulation.  In  that  case,  the  explanation 
appears  to  be  some  interaction  between  the  most 
closely  spaced  components,  200  and  342  Hz. 

While  there  is  a  clear  difference  between  the 
thresholds  obtained  in  the  in-phase  and  out- 
of-phase  conditions  for  the  lower  modulation  rates, 
once  the  frequency  of  modulation  exceeds  about 
20-40  Hz,  the  thresholds  for  the  two  conditions 
are  nearly  the  same.  This  result  is  easy  to  under¬ 
stand  in  terms  of  a  simple  filter  model.  Suppose 
the  rate  of  modulation  is  so  fast  that  the  side¬ 
bands  fall  outside  the  filter  located  at  the  center 
frequency  of  the  modulation.  When  such  a  condi¬ 
tion  occurs,  the  output  of  the  filler  is  constant  and 
shows  no  variation  in  amplitude  produced  by  the 
modulation.  When  the  amplitude  variation  is  no 
longer  present,  the  relative  phase  between  the 
signal  component  and  any  other  components  of 
the  spectrum  ceases  to  be  important.  Thus,  at  the 
highest  rates  of  modulation,  the  phase  angle,  theta, 
should  be  irrelevant  and  the  thresholds  for  the 
in-phase  and  out-of-phase  conditions  should  be 
nearly  the  same,  as  they  are. 

Note,  however,  that  the  region  where  frequency 
modulation  makes  phase  irrelevant  is  about  20-40 
Hz  and  is  largely  independent  of  the  density  of  the 
spectrum  or  the  signal  frequency.  Indeed,  for  the 
7-component  complex,  the  modulation  rale  at 
which  the  in-phase  and  out-of-phase  conditions 
yield  nearly  equal  threshold  is  about  the  same  for 
all  three  signal  frequencies. 

This  last  observation  raises  a  potential  problem 
with  the  preceding  filter  explanation.  A  simple 
application  of  this  filter  idea  would  suggest  that 
the  modulation  frequency  at  which  the  modula¬ 
tion  phase  becomes  irrelevant  should  vary  sys¬ 
tematically  with  signal  frequency.  The  reason  for 
this  expectation  is  straightforward.  We  know  that 


the  widths  of  the  frequency  channels  in  the  audi¬ 
tory  system  vary  systematically  with  center 
frequency.  One  estimate  of  the  critical  band 
(Zwicker,  1961)  is  about  16%  of  the  center 
frequency,  that  is,  about  a  bandwidth  of  32  Hz  at 
200  Hz,  160  Hz  at  1000  Hz,  and  640  Hz  at  4000 
Hz.  Thus,  as  frequency  of  modulation  is  varied,  a 
critical  band  centered  at  lower  frequencies,  be¬ 
cause  it  has  a  smaller  bandwidth,  will  produce  a 
near  steady  (carrier)  tone  at  much  smaller  rates  of 
modulation  than  a  band  centered  at  a  higher 
frequency.  If  we  assume  the  filter  output  is  nearly 
constant  when  the  sidebands  are  located  at  the 
filter  bandwidth,  then  a  frequency  of  modulation 
of  16  Hz  at  a  center  frequency  of  200  Hz  is 
equivalent  to  an  80-Hz  modulation  rate  at  a  center 
frequency  of  1000  Hz,  or  a  320-Hz  modulation 
rate  at  a  center  frequency  of  4000  Hz.  The  mod¬ 
ulation  rate  producing  an  equivalent  change  in 
filter  output  is  proportional  to  center  frequency. 

One  should,  however,  be  cautious  in  interpre¬ 
ting  how  this  fact  will  affect  the  detectability  of 
the  signal  in  this  experiment.  A  profile  experiment 
must  involve  a  comparison  of  the  signal  level  with 
the  nonsignal  level.  Previous  work  on  profile  anal¬ 
ysis  has  shown  that  the  comparison  process  is  not 
restricted  to  the  locally  adjacent  critical  bands 
(Green  et  al„  1984).  Thus,  interpreting  how  mod¬ 
ulation  frequency  and  signal  frequency  will  inter¬ 
act  is  complicated.  Consider  one  condition,  the 
threshold  for  the  2924-Hz  signal  frequency  with  a 
7-component  complex.  The  effects  of  modulation 
phase  become  negligible  for  a  modulation  rate  of 
about  20  Hz.  At  that  frequency,  the  critical  band 
at  the  signal  frequency  is  nearly  500  Hz  wide. 
Thus,  the  temporal  fluctuation  in  the  signal  chan¬ 
nel  should  be  considerable.  Yet  the  relative  phase 
of  that  modulation,  between  the  signal  and  non¬ 
signal  channels,  is  irrelevant.  The  only  explanation 
we  can  offer  is  as  follows.  Assume  that  the  level  of 
some  nonsignal  channels  (presumably  much  lower 
in  frequency)  is  being  used  as  a  basis  of  compari¬ 
son  with  the  level  in  the  signal  channel.  The  level 
in  these  channels  is  essentially  constant,  because 
the  center  frequency  and  hence  bandwidths  are 
much  smaller.  Therefore,  because  the  nonsignal 
channel  is  not  charging  in  level,  the  relative  phase 
of  fluctuations  in  the  2924-Hz  channel  is  irrele¬ 
vant. 
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Relative  phase  of  the  modulation 

In  this  experiment,  we  systematically  measure 
(in  steps  of  45°)  how  different  phase  angles  be¬ 
tween  the  two  modulators  would  affect  the  detect¬ 
ability  of  the  signal.  While  we  have  described  the 
comparison  of  different  levels  in  a  profile  task  as 
simultaneous,  it  is  possible  that  some  small  time  is 
taken  to  actually  compare  the  two  levels.  We  have 
used  the  word  ‘simultaneous’  to  distinguish  the 
process  from  the  successive  comparison  of  levels 
across  the  intervals  of  the  forced-choice  proce¬ 
dure,  a  process  that  involves  time  measured  in 
seconds.  The  ‘simultaneous’  comparison  of  profile 
analysis  may  occupy  a  few  milliseconds,  even  if  it 
occurs  within  a  single  observation  interval.  We 
wished  to  determine  if  this  time  were  measurable 
and,  therefore,  varied  the  relative  phase  of  mod¬ 
ulation  between  the  signal  and  nonsignal  compo¬ 
nents  in  much  finer  steps  than  the  two  used  in  our 
previous  conditions.  To  measure  the  time  of  com¬ 
parisons  precisely,  we  would  like  to  use  a  high 
frequency  of  modulation  so  that  changes  in  phase 
would  reflect  small  time  differences.  But  our  de¬ 
pendent  variable  is  the  difference  in  threshold  for 
two  phase  conditions.  Thus,  we  must  use  a 
frequency  of  modulation  that  produces  some  mea¬ 
surable  difference.  As  our  compromise,  we  selected 
the  1000-Hz  signal  (21-component)  condition  with 
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Fig.  7.  Results  for  the  relative  signal- nonsignal  phase  experi¬ 
ment.  A  21 -component  complex  was  used.  The  signal  frequency 
was  1000  Hz  and  the  frequency  of  modulation  was  5  Hz.  The 
ordinate  is  the  threshold  for  the  signal.  The  abscissa  is  the 
relative  phase  between  the  modulator  for  the  signal  and  non¬ 
signal  components.  Zero  degree  means  the  signal  and  nonsig¬ 
nal  components  are  in-phase,  180°  the  out-of-phase  condition. 

The  average  of  the  three  observers  is  plotted. 


a  modulation  rate  of  5  Hz.  That  condition  pro¬ 
duces  a  difference  in  threshold  of  about  15  dB, 
between  the  in-phase  or  out-of-phase  condition 
(Fig.  5,  middle  panel).  Unfortunately,  the  5-Hz 
modulation  rate  is  equivalent  to  a  period  of  200 
ms,  so  our  time  scale  for  phase  effects  is  relatively 
gross. 

Fig.  7  shows  the  results.  We  express  the  in-phase 
condition  as  zero  or  360°  and  the  out-of-phase 
condition  as  180°.  The  signal  threshold  for  each 
of  these  phase  conditions  is  plotted  along  the 
ordinate.  (The  threshold  at  360°  is  simply  the 
zero  point,  replotted.)  As  can  be  seen,  the  in-phase 
condition  appears  to  provide  the  lowest  threshold, 
and  the  180°  phase  angle  produces  the  highest 
threshold.  Intermediate  phase  angles  fall  along  a 
relatively  smooth  curve.  Any  small  delay  in  the 
comparison  process  must  be  either  zero  or  smaller 
than  about  125  /isec,  the  equivalent  of  a  45° 
phase  change. 

Conclusion 

The  detection  task  common  to  all  these  experi¬ 
ments  is  to  discriminate  a  change  in  spectral  shape 
of  a  multicomponent  complex. 

While  the  saliency  of  individual  components  of 
a  multicomponent  complex  can  be  enhanced  by 
amplitude  modulation,  the  detectability  of  ampli¬ 
tude  changes  in  such  components  is  not  greatly 
influenced.  For  amplitude  changes  in  single  com¬ 
ponents  of  a  multicomponent  complex,  only  at  the 
highest  signal  frequency  (4257  Hz)  does  amplitude 
modulation  appear  to  provide  a  consistent  benefit 
to  the  detection  of  such  changes  (Fig.  2).  For 
amplitude  changes  over  multiple  components,  dy¬ 
namic  conditions  generally  produce  much  poorer 
thresholds  than  a  stationary  condition. 

The  relative  coherence  of  the  signal  and  nonsig¬ 
nal  components  of  the  spectrum  is  important,  but 
only  at  the  lower  modulation  rates  (/m  <  40  Hz). 
Above  this  rate  of  modulation,  the  relative  phase 
of  different  parts  of  the  spectrum  is  unimportant 
(Figs.  5  and  6). 

Appendix 

The  following  are  the  equations  used  to  gener¬ 
ate  the  stimuli  used  in  the  first  two  experiments. 
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We  refer  to  the  standard  as  St(  / ),  and  the  signal  as 
S (().  These  waveforms  were  sampled,  digitized, 
and  stored  in  buffers,  which  were  then  plaved  to 
the  listeners  through  the  D  to  A  devices.  The 
signal-plus-standard  waveform  was  created  by  ad¬ 
ding  (point  by  point)  the  two  buffers,  representing 
the  signal  and  standard  waveform. 

Experiment  1.  Single-component  signal 

For  all  conditions  in  this  experiment,  the 
parameter  a  is  adjusted  to  estimate  the  signal 
threshold. 

Stationary  Profile  Condition 
21 

st(r)  =  Lacos(2ff/.'  +  8,) 

i-l 

S(r)  =  a,  cos(2 -rrfjt  +  0y) 

Dynamic  Conditions 
Condition  1 

21 

St(f  )  =  Y  a  C°s(2 w/,(  +  0,) 

1-1 

S(/)  =  a,[l  -  cos(2ir/mt )]  cos(2 w/,f  +  0,) 

Dynamic  Spectral  Changes 
Condition  2 

20 

St(r)  =  Y  a  cos(27r/,t  +  0,) 

/  <  >j 

+  a[l  -  cos(2ir/m/)]cos(2ir//  +  0,) 

S(0  =  a,  [1  -  cos(27r/mt )]cos(2ir/J/  +  0,) 

Condition  3 
21 

sat  n  t  -f-  0 

«-l 

s(0  =  aj  cos(2ir/wt )cos{2irfjt  +  0,) 

Experiment  2.  Multiple  component  signals 

For  all  conditions  in  this  experiment,  the 
parameter  k  is  adjusted  to  estimate  threshold. 


Stationary  profile  condition 
Condition  4 

21 

S.(f)  =  Y  a  cos{2irflt  +  0,) 

i-i 

21 

S (/)  =  Y  *(“!)'  cos(2nft+  0,) 

1-1 

Dynamic  conditions 
Condition  5 

21 

St( / )  =  Y  a  cos(2 w/,/  +  0,) 

i-i 

21 

S(t)  =  Y  *[l  -  cos(2ir/mf  )]cos(27r/,t  +  0,) 

i-i 

Condition  6 
21 

St(/)  =  Y  a  cos{2irf,t  +  0,) 

i-i 

S(0“  £*[l  +  (-l)'cos(2ir/mi)) 

i-i 

xcos(2ir/,r  +  0,) 
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Seven  experiments  on  the  detectability  of  intensity  changes  in  complex  multitonal  acoustic 
spectra  are  reported.  Two  general  questions  organize  the  experimental  efforts.  The  first 
question  is  how  the  detectability  of  a  change  in  a  fiat  (equal  energy)  spectrum  depends  on  the 
frequency  region  where  a  single  intensive  change  is  made.  The  answer  is  that  frequency  region 
plays  a  relatively  minor  role.  Frequency  changes  in  the  midregion  of  the  spectrum  are  the 
easiest  to  hear,  but  thresholds  increase  by  only  about  5  dB  over  the  range  from  200  to  5000  Hz. 

For  all  frequencies,  the  psychometric  function  is  of  the  form  d '  =  k(/\p),  where  k  is  a  constant 
and  A p  is  the  change  in  pressure.  The  second  question  is  how  can  we  predict  the  detectability 
of  complex  changes  over  the  entiie  frequency  range  from  the  detectability  of  change  at  each 
separate  region.  Thresholds  for  detecting  a  change  from  a  fiat  spectrum  to  a  spectrum  whose 
amplitude  varies  in  sinusoidal  ("rippled")  fashion  over  logarithmic  frequency  are  measured  at 
different  frequencies  of  ripple.  The  thresholds  are  found  to  be  independent  of  ripple  frequency 
and  are  7  dB  higher  than  predicted  on  the  basis  of  an  optimum  combination  rule. 

PACS  numbers:  43.66.Ba,  43.66.Dc,  43.66.Fe.  43.66.Jh  [  RDS] 


INTRODUCTION 

In  several  previous  papers,  we  have  reported  on  the 
ability  of  listeners  to  detect  alterations  in  the  shape  of  com¬ 
plex  acoustic  spectra.  Often  the  standard  stimulus  was  a 
multicomponent  spectrum  composed  of  equal-amplitude 
sinusoids.  The  change  was  created  by  increasing  the  intensi¬ 
ty  of  one  component  ot  the  standard.  In  this  paper,  we  sys¬ 
tematically  explore  the  question  of  how  the  frequency  of  the 
altered  component  affects  the  ability  to  detect  the  change.  Is 
it  easier  to  detect  a  low-  or  high-frequency  change  in  the 
intensity  profile  of  a  complex  stimulus?  There  are  several 
reasons  for  asking  such  a  question,  but  the  one  we  will  stress 
here  is  the  empirical  one.  We  will  need  this  information  to 
address  the  second  question  of  this  paper;  namely,  how  do 
we  use  the  data  on  the  detectability  of  changes  in  local  re¬ 
gions  of  the  spectra  to  predict  the  detectability  of  more  com¬ 
plex  changes? 

In  a  previous  study,  we  measured  how  detectability  of 
an  increase  in  a  single  component  changed  as  a  function  of 
the  frequency  of  the  component  (Green  and  Mason,  1985). 
In  general,  those  results  suggested  that  a  change  in  the  inten¬ 
sity  of  the  midfrequency  region,  500  to  2000  Hz.  produced 
superior  performance,  but  the  variability  among  observers 
was  sizable.  Also,  those  data  may  have  been  contaminated 
by  prior  practice  because  the  subjects  had  participated  in  an 
earlier  experiment  in  which  the  change  occurred  in  this  mid¬ 
frequency  region.  Although  extensive  training  was  given  for 
all  frequency  regions,  it  is  conceivable  that  the  effects  of  the 
earlier  practice  influenced  the  data.  In  the  present  study,  we 
used  the  recent  move  of  our  laboratory  as  an  opportunity  to 
recruit  a  set  of  listeners  who  had  nc  previous  training  at  any 
one  frequency  region. 

Once  we  have  studied  how  the  detectability  of  a  change 
in  a  single  region  of  the  spectrum  varies  with  component 
frequency,  we  are  ready  to  consider  spectral  changes  of  a 


more  complicated  variety.  Suppose  the  listener  were  trying 
to  detect  spectral  change  at  many  component  frequencies.  Is 
it  possible  to  develop  a  simple  rule  to  account  for  the  detec¬ 
tion  of  these  more  complicated  changes?  The  more  compli¬ 
cated  spectral  change  that  we  investigate  is  a  sinusoidal  al¬ 
teration  over  the  entire  spectral  profile — a  sinusoidal  ripple 
over  logarithmic  frequency.  The  frequency  of  this  ripple  is 
then  varied  and  detection  performance  assessed  for  a  num¬ 
ber  of  ripple  frequencies.  The  results  obtained  with  the  var¬ 
ious  ripple  frequencies  are  easy  to  summarize — they  all  pro¬ 
duce  about  the  same  threshold.  This  threshold,  however,  is 
about  7  dB  higher  than  would  be  expected  on  the  basis  of  an 
optimum  combination  of  the  detectabilities  at  the  local  re¬ 
gions. 

I.  GENERAL  PROCEDURE 

In  all  the  experiments,  the  listener’s  task  was  to  detect  a 
change  in  the  spectral  shape  of  a  complex  multicomponent 
waveform.  The  components  of  the  standard  were  always  of 
equal  amplitude  and  always  equally  spaced  on  a  logarithmic 
frequency  scale.  The  phase  of  each  component  of  the  stan¬ 
dard  was  chosen  at  random,  and  this  phase  was  used  for  all 
presentations  of  this  condition.  The  standard  spectrum  was 
altered  in  shape  by  changing  the  intensity  of  one  or  more  of 
the  sinusoidal  components.  This  alteration  can  be  thought  of 
as  adding  a  “signal"  waveform  to  the  standard.  Thus  the 
discrimination  task  was  to  distinguish  between  the  standard 
alone  and  the  standard  with  the  signal  added  to  it.  The  over¬ 
all  level  of  the  sounds,  standard  or  standard-plus-signal,  was 
varied  on  each  and  every  presentation  according  to  a  ran¬ 
dom  schedule,  so  that  the  observers  were  forced  to  detect  a 
change  in  the  shape  of  the  standard  spectra,  rather  than  sim¬ 
ply  a  change  in  intensity  at  some  region  of  frequency. 

All  waveforms  were  generated  digitally,  played  overdi- 
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gilal-to-analog  converters  at  a  sample  rate  of  25  000  Hz,  and 
low-pass  filtered  at  10  000  Hz.  The  duration  of  the  sounds 
differed  in  the  different  experiments,  but  all  were  turned  on 
and  off  with  a  5-ms  raised  cosine  window.  The  observers 
were  seated  in  sound-treated  (IAC  double-walled)  rooms 
and  the  stimuli  were  presented  binaurally  over  TDH-39  ear¬ 
phones,  both  phones  driven  in-phase. 

A  two-alternative  forced-choice  procedure  was  used 
with  an  adn-Uive,  two-down,  one-up  technique  to  estimate  a 
signal  level  corresponding  to  a  0.707  probability  of  correct 
choice  (d  '  —  0.76).  The  initial  step  size  of  4  dB  was  halved 
after  the  first  four  reversals.  Fifty  trials  were  run  in  blocks, 
and  the  estimated  threshold  was  computed  as  the  average  of 
the  remaining  pairs  of  reversals  after  excluding  the  first  three 
reversals.  Typically,  10  to  16  reversals  occurred  within  each 
block.  For  a  given  stimulus  condition,  6  runs  of  50  trials  were 
run  in  succession.  Each  trial  lasted  about  2  s,  and  it  took 
about  1 5  min  to  complete  6  runs  of  50  trials.  All  of  the  data 
reported  here  were  based  on  two  or  three  separate  replica¬ 
tions;  that  is,  average  thresholds  were  based  on  12  or  18  fifty- 
trial  blocks. 

Normal-hearing  observers  participated  in  the  experi¬ 
ments.  They  were  college  students  recruited  through  adver¬ 
tisements  placed  in  the  student  employment  office  and  the 
music  and  speech  departments.  They  were  paid  at  an  hourly 
rate  for  their  services  and  were  given  a  special  bonus  upon 
completing  the  entire  sequence  of  measurements. 


II.  SINGLE  SIGNAL  COMPONENT— EFFECTS  OF 
FREQUENCY  LOCATION 

A.  Single  component  signal  in  21 -component  profile 

The  “standard"  for  this  experiment  was  a  complex  of  2 1 
equal-amplitude  components  spaced  equally  on  a  logarith¬ 
mic  scale  of  frequency.  The  lowest  frequency  component 
was  200  Hz.  the  highest  was  5000  Hz,  and  the  ratio  of  the 
frequencies  of  successive  components  in  the  spectrum  was 
1 . 1 746.  The  level  of  the  standard  varied  between  trials  over  a 
range  of  20  dB  and  the  median  sound-pressure  level  of  the 
standard  was  40  dB  per  component.  Because  there  were  21 
componentsin  the  complex,  the  overall  level  was  13  dB  high¬ 
er  (53  dB  SPL). 

The  signal  was  a  single  sinusoid  added  in-phase  to  one 
component  of  the  standard.  A  threshold  was  measured  for 
detecting  this  increment  at  each  of  seven  different  frequen¬ 
cies:  234,  380,  617,  1000,  1620,  2626,  and  4256  Hz.  The  stim¬ 
ulus  duration  was  100  ms.  We  report  the  average  threshold 
over  six  listeners,  based  on  twelve  50-trial  determinations  of 
threshold  at  each  frequency 

Figure  1  presents  t..c  results  of  this  experiment.  The 
value  along  the  abscissa  is  the  frequency  of  the  component  to 
which  the  signal  was  added.  The  value  along  the  ordinate  is 
the  size  of  the  signal  at  threshold  measured  as  the  signal 
amplitude  re:  the  amplitude  of  the  component  of  the  stan¬ 
dard  to  which  the  signal  is  added  (in  phase).  For  example,  if 
the  signal  were  the  same  size  as  the  component  of  the  stan¬ 
dard,  the  threshold  would  be  reported  as  0  dB.  If  the  signal 
were  1  / 10  the  amplitude  of  the  component  of  the  standard  to 


2Q0  1000  £000 
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FIG.  1 .  Discrimination  of  an  increment  added  to  a  single  component  of  an 
equal-amplitude,  2 1  -component  standard  waveform  with  a  frequency  range 
of  200  to  5000  Hz.  The  abscissa  shows  the  frequency  of  the  incremented 
component  (signal)  and  the  ordinate  is  the  threshold  for  70%  correct  dis¬ 
crimination  of  the  signal.  Thresholds  are  the  ratio  of  the  level  of  the  signal 
increment  to  the  level  of  a  single  component  of  the  standard  in  dB.  Error 
bars  are  the  standard  error  computed  over  6  subjects  with  12  runs  each. 


which  it  is  added,  the  threshold  would  be  —  20  dB.  As  can 
be  seen,  the  detection  of  the  increment  does  vary  some  with 
signal  frequency.  The  midfrequency  region,  500  to  2000  Hz, 
produces  the  best  detection.  Increments  in  the  flat  spectrum 
outside  this  frequency  region  are  somewhat  more  difficult  to 
detect,  but  the  difference  never  exceeds  10  dB.  The  error  bars 
are  the  standard  error  of  the  mean  computed  over  the  72 
threshold  estimates  made  at  each  frequency  (6  observers 
and  12  threshold  estimates  per  observer).  The  average  data, 
shown  in  the  figure,  are  typical  of  all  the  observers.  The  re¬ 
sults  are  similar  to  those  obtained  by  Green  and  Mason 
(1985). The  function  ofFig.  1  is  smoother  and  shows  slight¬ 
ly  less  variation  with  frequency  than  was  found  in  the  earlier 
study. 

B.  Effects  of  overall  intensity  and  duration 

The  stimulus  conditions  were  similar  to  those  employed 
in  experiment  1 ,  except  the  overall  intensity  level  of  the  stim¬ 
uli  was  increased  20  dB  and  the  median  standard  level  was  60 
dB  SPL  per  component  rather  than  the  40-dB  level  used  in 
the  previous  experiment.  Also,  two  presentation  durations 
were  studied:  100  ms  as  in  the  previous  experiment,  and  30 
ms.  Three  observers  participated  in  this  experiment;  only 
one  had  participated  in  the  first  experiment.  These  three  ob¬ 
servers  participated  in  all  the  remaining  experiments. 

Figure  2  presents  the  results  of  this  experiment.  The 
quantities  plotted  on  the  ordinate  and  abscissa  are  the  same 
as  in  Fig.  1 .  The  threshold  values  for  the  30-ms  presentation 
duration  are  shown  by  open  circles  (the  upper  dashed 
curve),  while  the  100-ms  data  are  plotted  as  open  triangles 
(the  lower  curve).  The  solid  line  segments  are  the  results 
obtained  in  the  first  experiment  with  a  100-ms  duration  and 
lower  intensity  level. 

The  100-ms  presentation  duration  produces  lower  sig¬ 
nal  thresholds  than  the  30-ms  presentation  duration  at  al¬ 
most  all  frequencies.  We  are  puzzled  by  the  two  thresholds 
being  the  same  at  the  highest  signal  frequency  (4256  Hz) 
and  suspect  this  coincidence  is  chance  fluctuation.  At  all 
other  frequencies,  except  380  Hz,  the  difference  in  threshold 
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FIG.  2.  Effect  of  level  and  signal  duration  on  threshold.  Conditions  were  the 
same  as  in  Fig.  I  (shown  as  solid  line),  except  intensity  levels  of  the  stimuli 
were  increased  20  dB  to  a  median  level  of  60  dB  per  component.  In  one 
condition,  signal  duration  was  the  same  as  in  experiment  1  ( 100  ms:  trian¬ 
gles);  in  another  condition,  30  ms  was  used  for  signal  duration  (circles). 
Error  bars  are  standard  errors  computed  over  18  runs  for  three  subjects. 


for  the  two  durations  is  nearly  the  same.  The  average  differ¬ 
ence  in  threshold,  over  all  frequencies,  is  3.6  dB.  This  value  is 
only  slightly  smaller  than  the  value  of  5  dB,  which  one  would 
expect  from  an  equal-energy  rule.  Our  definition  of  the  sig¬ 
nal  threshold  is  proportional  to  the  level  of  the  signal;  thus  a 
change  in  duration  of  a  factor  of  three  would  necessitate  a 
change  in  signal  power  of  a  factor  of  3,  or  5  dB,  to  hold  signal 
energy  constant.  The  equal-energy  rule  has  received  empiri¬ 
cal  support  in  a  previous  paper  (Green  etal.,  1984). 

Detection  thresholds  for  this  experiment  are  generally 
similar  as  a  function  of  frequency  to  those  obtained  in  the 
first  experiment  (solid  line  segments).  The  only  difference 
worth  comment  is  that,  whereas  the  first  experiment  showed 
a  shallow  bowl-like  curve,  the  results  of  the  second  experi¬ 
ment  show  less  of  an  increase  in  threshold  for  the  lower  fre¬ 
quencies.  Averaging  the  data  at  the  two  durations  would 
show  a  nearly  flat  function  for  the  lower  and  midfrequency 
region  and  a  slight  increase  at  the  highest  frequency. 
Whether  this  difference  in  the  two  experiments  arises  be¬ 
cause  of  differences  in  observers  or  because  these  observers 
have  now  had  more  practice  in  this  detection  task  is  un¬ 
known.  We  believ**  that  training  may  play  some  role  since,  in 
our  experience,  there  is  a  very  slow  improvement  in  the 
ability  to  hear  the  spectral  change  in  the  lower  frequency 
region  that  is  not  evident  for  the  higher  frequencies. 

One  purpose  of  this  experiment  was  to  determine  if  the 
effects  of  frequency  were  altered  appreciably  by  a  presenta¬ 
tion  duration  of  30  ms,  which  is  shorter  than  the  duration  of 
an  acoustic  reflex.  Such  does  not  seem  to  be  the  case,  and  we 
can  rule  out  the  acoustic  reflex  as  playing  any  significant  role 
in  the  studies  that  employed  longer  presentation  duration. 

C.  Frequency  context 

In  the  two  preceding  experiments,  the  best  thresholds 
occur  for  the  middle  frequencies  of  the  standard,  between 
500  and  2000  Hz.  Does  this  reflect  greater  sensitivity  for 
these  frequencies,  or  are  these  lower  thresholds  because  this 
region  is  in  the  center  of  the  standard?  To  answer  this  ques¬ 
tion,  we  generated  two  21 -component  standard  stimuli  with 


only  slightly  overlapping  frequency  ranges.  The  "low-fre¬ 
quency”  standard  ranged  in  frequency  between  200  and 
2000  Hz.  The  ratio  of  frequencies  of  successive  components 
of  the  standard  was  1.122.  The  "high-frequency”  standard 
ranged  in  frequency  between  1000  and  10  000  Hz  and  had 
the  same  ratio  between  successive  frequency  components  as 
the  low-frequency  set.  For  each  standard,  we  measured  the 
threshold  for  an  increment  in  a  single  component  of  the  stan¬ 
dard,  in  either  a  relatively  low-,  middle-,  or  high-frequency 
region  of  that  standard.  The  three  signal  frequencies  were 
224,  632,  and  1782  Hz  for  the  low-frequency  standard;  the 
signal  frequencies  were  1122,  3162,  and  8912  Hz  for  the 
high-frequency  standard.  The  other  conditions  were  similar 
to  those  used  in  the  first  experiment.  The  component  level  of 
the  standard  was  40  dB  and  the  presentation  duration  was 
100  ms.  The  thresholds  were  based  on  eighteen  50-trial  runs. 

Figure  3  presents  the  result  of  this  experiment.  The  ordi¬ 
nate  and  abscissa  are  the  same  as  those  used  in  Fig.  1.  The 
thresholds  for  the  three  signal  frequencies  in  the  lower  fre¬ 
quency  standard  are  shown  as  the  open  circles.  The  thresh¬ 
olds  for  the  three  signal  frequencies  in  the  higher  frequency 
standard  are  shown  as  the  open  triangles.  The  curve  depicted 
by  the  solid  line  segments  is  the  result  obtained  in  the  first 
experiment  (with  frequency  range  from  200  to  5000  Hz). 

For  the  lower  frequency  standard,  the  middle-frequency 
signal  is  the  easiest  to  hear  and  either  end  of  the  frequency 
range  produces  higher  thresholds,  a  result  consistent  with 
the  finding  in  the  first  experiment.  In  fact,  the  two  lower 
frequency  signals  have  thresholds  remarkably  similar  to 
those  obtained  with  the  wider  frequency  complex  employed 
in  the  first  experiment.  The  threshold  for  the  upper  frequen¬ 
cy  signal,  1782  Hz,  however,  is  nearly  lOdBhigherthan  that 
determined  for  the  wider  frequency  complex.  This  result 
presumably  reflects  the  effects  of  context,  the  relative  loea- 


FIG.  3.  Effect  of  frequency  range  on  discrimination  of  an  increment  on  a 
single  sinusoid  in  a  2 1 -component  spectrum.  The  conditions  were  similar  to 
those  used  in  the  first  experiment  (solid  line)  with  the  frequency  range  of 
the  spectrum  changed.  The  low-frequency  standard,  shown  as  aides, 
ranged  in  frequency  from  200  to  2000  Hz  The  high-frequency  standard 
(tiiangles)  ranged  from  1 000-10  (KM)  Hz.  Error  bars  are  the  st.uuhid  error* 
computed  over  12  runs  foi  three  subjects 
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tion  of  the  signal  frequency  within  the  standard  complex. 
Similar  effects  of  context  have  been  reported  in  previous  pa¬ 
pers  (Green  and  Mason,  1985). 

For  the  high-frequency  complex,  the  middle-frequency 
signal  is  not  the  easiest  to  detect,  and  it  appears  that  at  these 
higher  frequencies,  the  frequency  of  the  signal  component 
per  se  exerts  a  stronger  influence  on  the  signal’s  threshold 
than  does  context.  The  effect  of  context  is  again  evident  if  we 
compare  the  thresholds  obtained  in  this  experiment  with 
those  obtained  in  the  first  experiment  (solid  curve).  The 
presence  of  components  below  1000  Hz,  as  occurs  in  the  200- 
to  5000-Hz  standard,  produces  lower  thresholds  for  every 
component  where  comparisons  can  be  made  than  those  ob¬ 
tained  for  the  complex  extending  from  1000  to  10  000  Hz. 

As  a  simple  summary,  we  may  say  that  signals  in  the 
middle  of  a  standard  are  generally  easier  to  detect  than  sig¬ 
nals  located  at  the  extremes,  providing  the  entire  range  is 
located  below  at  least  5000  Hz.  Above  this  frequency,  the 
absolute  frequency  of  the  signal  may  play  a  larger  role  than 
the  effects  of  context. 

D.  Extended  frequency  range 

In  this  experiment,  we  used  a  standard  with  as  wide  a 
frequency  range  as  is  practically  possible.  The  standard  in 
this  experiment  was  a  30-component  complex  ranging  in  fre¬ 
quency  from  200  to  10  000  Hz.  The  median  level  of  the  com¬ 
ponents  was  50  dB  Sl’L,  and  the  ratio  of  the  frequencies 
between  successive  components  was  1.144.  The  signal  pre¬ 
sentation  duration  was  100  ms. 

The  addition  of  the  “signal”  produced  a  change  in  five 
adjacent  components  of  the  standard.  If  we  number  these 
five  components  successively  starting  with  the  lowest  fre¬ 
quency.  then  3  is  the  middle  component  of  the  set.  The  odd 
components  of  this  set.  1,  3,  and  5,  were  increased  in  ampli¬ 
tude  and  the  even  components.  2  and  4,  were  decreased  in 
amplitude.  Thus  the  observers  were  discriminating  between 
two  stimuli,  the  standard  with  a  flat  (equal  amplitude)  spec¬ 
trum,  or  the  signal-plus-standard  with  a  five-component  rip¬ 
ple  located  at  some  frequency  region  within  the  flat  complex. 
The  frequency  region  of  this  ripple  was  the  independent  vari¬ 
able  of  the  experiment.  The  threshold  for  this  ripple  was 
measured  at  six  different  regions,  which  was  specified  by  the 
frequency  of  the  middle  component  of  the  five-component 
complex,  namely,  261,  514,  1009,  1981,  3889,  and  7635  Hz. 
The  thresholds  are  based  on  twelve  50-trial  runs. 

Figure  4  presents  the  result  of  this  experiment.  The 
curve  depicted  by  the  solid  line  segments  are  the  data  from 
the  first  experiment.  We  should  comment  on  how  threshold 
values  are  computed  for  these  five-component  signals.  We 
have  plotted  the  threshold  on  a  single-component  basis,  thus 
—  20  dB  means  that  the  amplitude  of  all  five  signal  compo¬ 
nents  is  1/10  the  amplitude  of  the  standard  component  to 
which  it  is  added  (or  subtracted ).  We  have  never  conducted 
a  formal  experiment  comparing  increments  and  decrements, 
but  informal  testing  has  convinced  us  that  the  detectability 
of  a  fixed  signal  amplitude  is  not  very  different  whether  we 
add  or  subtract  it  from  a  component  of  the  standard. 

If  total  signal  energy  were  used  as  a  measure  of  thresh¬ 
old,  instead  of  our  single-component  measure,  then  the  five- 


FIG  4.  Discrimination  ofa  5-componen(  ripple  as  a  function  of  the  frequen- 
cy  of  the  center  component  of  the  ripple  in  a  2 1  -component  standard  with  a 
200-  to  10  000-Hz  range.  Ripples  were  on  successive  components  of  the 
standard  with  the  phases  such  that,  when  added  to  the  standard,  the  first, 
third,  and  fifth  components  of  the  ripple  were  incremented,  while  the  other 
two  components  were  decremented  by  a  like  amount.  Thresholds  are  the 
level  of  the  increment  relative  to  the  level  of  a  single  component  of  the  stan¬ 
dard  in  dB. 

component  ripple  is  about  7  dB  greater  in  energy  than  any 
single  component.  If  all  the  data  points  were  increased  by  7 
dLf,  nearly  aii  would  fall  above  the  solid  line,  which  repre¬ 
sents  the  threshold  for  the  single-component  signal  used  in 
that  experiment.  Thus  we  find  that  the  single-component 
signal  is  the  easiest  signal  to  detect  on  an  energy  basis.  A 
similar  conclusion  was  reached  by  Green  and  Kidd  ( 1983). 

The  general  shape  of  the  function  is  similar  to  what  we 
found  in  the  other  experiments.  The  middle-frequency  re¬ 
gion  is,  once  again,  the  easiest  in  which  to  detect  a  change  in 
the  spectrum,  w  ith  the  higher  frequencies  being  much  worse, 
especially  at  the  extreme  frequencies. 

E.  Summary  of  frequency  location  experiments 

In  general,  all  the  results  of  these  experiments  exploring 
the  frequency  locus  or  the  change  in  shape  of  a  complex 
spectrum  reveal  no  strong  effects  of  frequency.  When 
threshold  is  plotted  as  a  function  of  frequency,  the  data  re¬ 
semble  a  shallow  bowl  with  the  minimum  located  in  the 
moderate-frequency  range,  500  to  2000  Hz.  At  the  very 
highest  frequencies,  the  signal  can  be  more  difficult  to  detect, 
by  as  much  as  10  to  13  dB,  but  no  abrupt  changes  in  this 
function  are  evident.  Only  at  frequencies  as  high  as  7000  Hz 
does  tlie  ability  to  detect  changes  in  a  complex  spectrum 
appear  to  deteriorate  substantially. 

III.  DETECTION  OF  COMPLEX  SPECTRAL  CHANGES 

In  the  next  series  of  experiments,  we  turn  our  attention 
to  the  detection  of  complex  alterations  in  spectral  shape.  To 
predict  the  detectability  of  such  complicated  changes,  we 
hoped  to  use  a  rule  based  on  the  detectability  of  changes  at 
individual  components.  To  implement  such  a  scheme,  we 
first  need  to  know  the  trading  relation  between  signal  ampli¬ 
tude  and  signal  detectability;  that  is,  we  need  to  know  the 
psychometric  functions  for  changes  in  individual  compo¬ 
nents. 

A.  Psychometric  function 

In  this  experiment,  we  estimate  the  psychometric  func¬ 
tions  for  the  detection  of  an  increment  in  a  single  component 
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at  three  frequencies,  380,  1000,  and  2626  Hz.  For  a  given 
frequency,  the  observer  heard  as  the  alternative  of  a  two- 
interval  forced-choice  trial  either  the  standard  alone  or  one 
of  three  fixed-signal  levels  added  to  the  standard.  All  three 
signal  levels  occurred  with  equal  probability  within  a  single 
listening  session  of  100  trials,  so  that  signal  level  would  not 
be  confounded  with  trial  block.  The  signal  levels  were  cho¬ 
sen  on  the  basis  of  a  prior  estimate  of  threshold  obtained 
using  the  adaptive  procedure.  The  middle  signal  level  was  set 
to  produce  about  75%  correct  and  the  other  two  signals  set 
at  level  6  dB  above  and  below  this  value.  Ten  100-trial  runs 
were  used  to  estimate  the  psychometric  function  at  each  fre¬ 
quency  so  that  about  333  trials  were  used  to  estimate  the 
percentage  of  correct  judgments  at  the  three  signal  levels. 

Actually,  two  psychometric  functions  were  estimated  in 
two  different  experimental  conditions  at  each  of  the  three 
signal  frequencies.  One  condition  was  a  profile  condition.  In 
this  case,  the  signal  component  was  presented  with  20  other 
components  present  and  the  overall  level  was  randomly  var¬ 
ied  (50  dB  ±  10  dB  SPL).  In  the  second  condition,  the 
single  component  was  presented  in  isolation  at  a  fixed  level 
(60  dB  SPL),  so  we  are  estimating  the  psychometric  func¬ 
tion  for  a  simple  intensity-discrimination  task.  Stimulus  du¬ 
ration  was  100  ms. 

Psychometric  functions  for  the  simple  pure-tone  inten¬ 
sity  discrimination  task  and  profile  tasks  are  shown  in  Figs.  5 
and  6,  respectively.  The  data  for  the  three  listeners  and  three 
signal  frequencies  are  presented  in  each  figure.  The  data 
shown  in  the  figures  were  obtained  from  the  follow  ing  proce¬ 
dure.  First,  we  converted  the  percentage  of  correct  responses 
at  each  signal  level  tod'.  Next,  for  each  individual  condition 
and  listener,  we  plotted  three  data  points,  the  value  of 
20  log  d '  versus  signal  level  ( 20  log  signal  pressure ) .  These 
data  were  then  fit  with  a  line  having  a  slope  of  unity  and  one 
free  parameter,  the  signal  pressure  tha'  produced  ad'  =  1. 
For  each  listener  and  condition,  we  let  this  pressure  be  0  dB. 
In  this  way,  all  of  the  data  for  all  conditions  and  listeners 
could  be  plotted  on  a  single  graph,  as  in  Figs.  5  and  6. 

As  these  figures  show,  the  detectability  of  the  signal  in¬ 
creases  monotonically  with  the  level  of  the  signal.  The  aver¬ 
age  slope  measured  for  the  ten  100-trial  runs  for  all  condi¬ 
tions  (subjects  and  frequencies)  is  0.97  for  the  profile 
condition  (Fig.  6)  and  0.75  for  the  intensity-discrimination 
condition  (Fig.  5).  For  the  latter  condition,  previous  experi¬ 
ments  have  found  a  slope  value  close  to  unity  for  well-prac¬ 
ticed  listeners  (Green,  1960).  The  low  value  for  the  slope  in 
this  condition  probably  reflects  a  lack  of  sufficient  training  in 
that  condition.  Our  listeners  had  spent  most  of  the  time  lis¬ 
tening  to  profile  conditions.  They  all  complained  about  the 
difficulty  of  the  pure  intensity-discrimination  experiment. 
One  observer  summarized  his  frustration  by  saying  “The 
only  thing  you  can  listen  for  is  a  difference  in  loudness." 

A  lthough  the  linear  relation  between  d  '  and  signal  pres¬ 
sure  provides  a  very  good  approximation  to  the  data,  other, 
alternative  expressions  are  also  consistent  with  the  data.  An¬ 
other  suggestion  for  the  form  of  the  psychometric  function  is 
that  d'  is  proportional  to  the  difference  in  level  (A/.) 
between  the  standard  component  and  the  standard-plus-in¬ 
crement  (Rabinowitz  et  a/.,  1976).  In  this  formulation, 


14 
12 
10 
8 
6 

.  4 

■°  2 
CD 

°  0 
°  -2 
-4 
-6 
-8 
-13 
-12 
-14 

FIG  5.  Psychometric  function  for  intensity  discrimination  I’lnn-,)  tre  d 
for  a  single  subject  at  each  of  three  signal  frequencies  and  levels.  The  func 
turns  were  adjusted  so  that  a  d  '  —  I  occurs  at  0  dB  for  each  condition. 

d '  —  k\L  =  k  10  log(  1  +  AJ/7)  =  k  20  log(  1  A p/p) 

where  k  is  a  constant  that  depends  on  the  experimental  con¬ 
dition  and  the  listener,  /  is  the  intensity  (p  the  pressure)  of 
the  standard,  and  A/  (or  A p)  is  the  increment  in  intensity 
(or  pressure).  Recall  t hat  we  always  add  the  signal  compo¬ 
nent  in-phase  to  the  component  of  the  standard.  For  small 
values  of  A  p/p,  A  L  is  approximately  equal  to  8.6S6*  ( A  p/p). 
The  data  of  Fig.  6  show  a  linear  relation  between  d  and 
A p/p,  but  this  could  be  interpreted  equally  well  as  implying  a 
linear  relation  between  d  '  and  AZ..  The  present  data  pro\  tde 
no  way  to  choose  between  these  different  expressions  for  the 
form  of  the  psy  chometric  function. 

14 
12 
10 
8 
6 

.  4 

13  2 
O! 

°  0 
°  -2 
-4 
-6 
-8 
-10 
-12 
-14 

FIG  6  Psychometric  function  for  discrimination  of  a  change  tn  spcdtal 
shape,  the  addition  of  an  increment  on  a  single  component  in  a  2 1  -compo¬ 
nent  standard  flat  spec’,!  tint  The  coordinates  arc  the  same  as  Ftg  5 
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TIG  7.  Waveforms  showing  three  different  frequencies  A  of  sinusoidal  variation  in  component  amplitudes. 


B.  Detection  of  rippled  spectra 

In  litis  experiment,  the  standard  waveform  was  the  21- 
eomponent  flat  spectrum  that  ranged  in  frequency  from  200 
to  5000  Hr  fhe  successive  components  were  spaced  equally 
on  a  logarithmic  scale  of  frequency When  the  signal  wavc- 
femt  was  added  to  the  standard  spectrum,  it  produced  a 
resulting  spectrum  whose  amplitude  varied  sinusoidally  as  a 
function  of  the  logarithm  of  frequency,  what  we  call  a  “rip¬ 
pled''  spectrum.  Figure  7  shows  this  manipulation  graphi¬ 
cally  Ihe  first  spectium  shows  a  single  cycle  of  sinusoidal 
variation  in  amplitude  over  our  21  components;  the  next 
spectrum  shows  twit  cycles  of  amplitude  variation:  and.  fin¬ 
ally.  the  Iasi  figure  show  s  the  highest  rate  of  v  ariation  that 
can  he  achieved,  since  alternate  components  increase  and 
dee  i  ease  m  amplitude. 

Specifically,  the  "signal"  waveform  was  produced  by 
setting  the  amplitude  of  successive  components,  ac¬ 

cording  to  ihe  follow  irig  t  qua non 

./{/)  srn  O  rkt,M).  /  1.2 . M. 

wh-.re;  is  the  niinther  of  the  component,  ranging  in  this  case 
In  in  !  m  _  I .  ol :  I  is  the  amplitude  of  the  rth  component  t'f 
the  signal  sped  t  inn.  and  k  is  frequency  of  the  t  ipple.  Recall 
ilm!  the  fust  component,  /  I.  co; responds  to  a  frequency 

of  2nn  II/.  , ; : c S  the  last  component,  i  21.  corresponds  to  a 
frequency  of  5000  11/  We  scared  Ihe  amplitude  of  this  “si g- 
rafi'  and  added  each  component  til-phase  (  respecting  sign  ) 
t  '  ;he  conespop.iing  component  ->f  the  fiat  standard  spec¬ 
trum  to  pi  educe  the  v  hange  in  the  spectrum,  as  show  n  in  lag 
I",  that  figure,  the  signal  amplitude  is  about  200  of  the 
standard  amplitude.  A  cosine  ripple  can  be  constructed  m 
the  same  m  time!  by  substituting  the  cosine  function  for  the 
sine  I'uncti  ui  m  the  equation, 

li  should  be  noted  that  ,vy  construe  ting  the  signal  in  this 
vv.-.v  t  lie  same  21  values  oceut  for  the  set  of  amplitudes,  c/|  / }, 
independent  of  the  frequency  ol  the  ripple  k  1  he  paiameter 
k  simply  reordets  the  set  of  21  values  (  )ne  result  of  this  fact 
is  that,  lot  higher  ripple  frequencies,  the  speed  utn  appears  to 
have  a  smont lime  function  imposed  on  t he  ripple  ;  compare 
tin-  bn-:  w  Vs  ripple  with  the  one-  or  two-cycle  i  ipple  in  I  ig 
’  i  Anothei  eonsequenee  of  tins  fact  is  ih.n  a  quantity  ..uch 
is  die  root  mean  squaic  (nils)  ot  the  amplitude  values  is 
i  iidepeiivl'-iu  ol  the  It  equeney  of  i  lie  ripple  s  It  the  maxi  mum 
value  fii|  ,.■;;(  ts  1.  the  mis  value  is  d'lP  \  cosine  ripple 
■vi'b  die  -mie  si.ihng  bus  ’he  --ante  inis  value  as  the  sme 
i  1  •  e.  .  aiul  lliis  value  Is  also  independent  o!  s 

I  fie  1  sl-ept  Id"  ol  i  he  i  ipple  nit  me  i  i  ■  'tit  the  add  u  ion  ■  -I 
ifi  ■  ■  n  - 1  i .  -  ihe  si  and.  1 1 ,1  \v  iVi.-t.mn  ,t,  p.  mis  upon  tin  i  .n  o 


of  the  amplitudes  of  the  signal  components  to  those  of  the 
standard’s  equal-amplitude  components.  Thus  it  is  conven¬ 
ient  to  use  the  rms  value  of  the  2 1  -signal  components  as  our 
measure  of  signal  amplitude.  We  refer  to  the  signal-to-stan- 
dard  ratio  as  the  rms  signal  amplitude  to  the  amplitude  of 
any  component  of  the  standard.  The  depth  of  the  ripple  is,  of 
course,  monotonic  related  to  the  signal-to-sta^dard  ratio. 

Table  I  lists  the  average  threshold  measured  for  these 
rippled  (for  some  sine  and  all  cosine)  spectra  at  different 
frequencies  of  ripple  k.  The  threshold  for  the  signal  is  mea¬ 
sured  in  terms  of  the  signal(rms)-to-standard  ratio  and  is 
nearly  constant  and  independent  of  the  frequency  of  the  rip¬ 
ple.  That  is,  the  different  changes  in  spectral  shape  caused  by 
v  ary  mg  k  were  equally  detectable  and  the  thresholds  were  all 
about  -  24  dB.  In  only  one  instance,  k  —  9  cosin?  ripple, 
aid  the  threshold  differ  from  the  mean  by  more  than  2  dB. 

I  el  us  now  explore  the  question  of  how  vve  .night  try  to 
predict  these  data  Can  vve  account  for  the  detection  of  a 
rippled  spectrum  (an  intensity  change  in  several  compo¬ 
nents  of  the  complex )  on  the  basis  of  the  listener's  ability  to 
detect  an  intensity  change  in  each  individual  component0  In 
other  w  ords.  can  vve  predict  the  detection  of  a  broad  spectral 
change  on  the  basis  of  the  detection  at  each  point  along  the 
speett utn'’  In  experiment  2.  vve  obtained  data  for  the  three 
listeners  in  a  task  requiring  them  to  detect  a  change  in  a 
single  component  of  a  21 -component  complex  (Fig.  2,  open 
triangles)  We  attempted  to  use  those  data  (extrapolating 
the  threshold  from  the  seven  measured  frequencies  to  the 
remaining  14)  to  see  if  the  detectability  of  rippled  spectrum 
could  he  predicted  on  the  basis  of  the  detectability  of  the 
mdiv  iduji  components  One  of  the  simplest  rules  is  the  opli- 
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mum  combination  rule,  in  which  the  d '  for  the  combined 
signal  is  the  square  root  of  the  sum  of  individual  d 's  squared 
(see  Green  and  Swets,  1966,  p.  239;  also  Green,  1958).  In 
experiment  5,  we  found  that  d '  is  proportional  to  signal  pres¬ 
sure  (see  Fig.  6);  thus  we  can  determine  the  pressure  at  each 
component  needed  to  achieve  the  observed  level  of  detect¬ 
ability  for  the  rippled  spectrum. 

Figure  8  shows  the  results  of  that  calculation.  The  ab¬ 
scissa  is  the  component  number,  1  represents  the  200-llz 
component,  and  21  represents  the  5000-Hz  component.  The 
ordinate  is  the  relative  signal  pressure  (ratio  of  signal  pres¬ 
sure  to  pressure  of  that  component  of  the  standard).  The 
average  threshold  data  for  the  single  increment  task  are  plot¬ 
ted  on  the  scale  of  relative  pressure  and  shown  by  the  open 
circles.  This  is  the  same  data  shown  in  Fig.  2  (open  trian¬ 
gles)  except  the  data  in  that  figure  are  plotted  on  a  decibel 
scale.  In  addition,  in  Fig.  8,  we  have  interpolated  or  extrapo¬ 
lated  the  threshold  values  for  the  missing  frequencies.  The 
relative  threshold  value  at  each  component  of  the  rippled 
spectrum  is  shown  bv  the  sinusoidal  function  marked  with 
crosses.  We  have  chosen  the  threshold  data  for  a  1  -cycle  sine 
ripple — the  thresholds  for  the  different  frequencies  ot  rippie 
are  so  similar  it  matters  little  which  frequency  we  select  (see 
Table  I).  The  threshold  predicted  by  the  optimum  combina¬ 
tion  rule  is  shown  by  the  smaller  amplitude  sinusoid  indicat¬ 
ed  by  the  solid  line.  The  difference  between  the  predicted  and 
obtained  value  is  about  7  dB. 

This  is  a  sizable  discrepancy,  and  it  is  not  sensitive  to 
details  of  the  initial  threshold  values.  To  demonstrate  the 
robust  nature  of  this  discrepancy,  suppose  the  individual 
thresholds  were  all  about  equal  and  at  a  relative  pressure 
value  of  0. 15  (a  signal  level  —  16  dB  below  the  level  of  the 
standard).  There  are  21  components,  so  that  the  square  root 
of  the  sum  of  equal  d's  is  (21) 1  2  —  4.6  or  an  expected  im¬ 
provement  of  13,2  dB.  The  observed  improvement  is  about  8 
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FIG  8  Relative  signal  pressure  to  standard  pressure  as  a  function  of  the 
component  number  of  the  2 1 -component  complex.  The  points  marked  b> 
the  open  circles  are  the  average  threshold  values  for  the  detection  of  an 
increment  on  that  component  in  a  flat  standard  spectrum  The  Miuisoid.il 
function  marked  by  crosses  is  the  threshold  pressure  lor  tJ#e  detection  of  .1 
ripple  stimulus  versus  a  flat  spectrum  The  sinusoidal  fumtion  marked  b\ 
the  line  is  the  predicted  threshold  prev-ure  for  this  condition  a  -ord;.!^'  to 
rlu-  ontimum  cornbin  01. >n  rule 


dB,  from  —  16  to  —  24  dB,  a  discrepancy  of  5.2  dB. 
Further,  the  direction  of  the  discrepancy  is  similar  to  that 
found  by  Green  and  Kidd  ( 19S3 ).  They  compared  an  incre¬ 
ment  on  a  single  component  w  ith  increments  added  to  all  2 1 
components  of  the  standard.  The  improvement  in  threshold 
was  about  5  dB  less  than  might  have  been  expected  by  a 
optimum  combination  rule. 

If  one  maintains  the  optimum  combination  rule,  then 
the  only  avenue  of  escape  is  to  argue  that  there  are  not  21 
independent  d' s  that  contribute  to  the  detectability  of  the 
complex  spectral  change,  but  some  lesser  number.  The  re¬ 
duction  needed  to  fit  the  data  is  sizable.  If  we  want  an  8-dB 
improvement,  rather  than  1 3  dB,  then  we  conclude  there  are 
only  6.5  independent  detectors  contributing  to  the  detection 
of  the  rippled  spectrum.  If  we  want  only  a  6-dB  improve¬ 
ment,  the  number  is  4  independent  detectors.  The  assertion 
that  there  are  only  4  or  6.5  independent  detectors  covering  a 
frequency  range  of  4.6  oct  (200  to  5000  Hz)  would  mean 
these  bands  span  1  oct  each  (assuming  4  detectors)  or  3/4 
oct  each  (assuming  6.5  detectors).  The  w  idth  of  the  widest 
critical  band  estimates  is  about  1/5  oct.  This  means  the  pro¬ 
file  analysis  band  is  between  4  and  2.5  times  larger  than  a 
critical  band.  The  assumption  of  such  a  wide  profile  anal)  sis 
band,  however  is  inconsistent  with  the  mean  threshold  data. 
A  10-cvcle  ripple  implies  that  a  single  cycle  covers  about  1/2 
oct  (4.6/10),  or  that  both  a  peak  and  valley  of  the  ripple  fall 
in  the  same  analysis  band.  The  threshold  for  the  10-eycie 
ripple  should,  therefore,  be  elevated  with  respect  to  the 
thresholds  for  the  all  conditions  with  a  lower  frequency  of 
ripple.  The  data  (Table  1)  show  the  threshold  is  virtually 
independent  of  ripple  frequency.  We  clearly  need  to  make 
independent  estimates  of  the  width  of  these  analysis  bands 
before  w  e  can  accept  these  numbers. 


C.  Effects  of  spectral  density 

The  final  question  we  wish  to  address  is  how  the  number 
of  components  in  the  spectrum  might  affect  the  ability  to 
discriminate  a  flat  from  a  rippled  spectrum.  In  this  last  ex¬ 
periment,  using  the  same  three  observers,  we  varied  the 
number  of  components  used  to  generate  the  spectrum  lor  a 
single,  low-frequency  ripple,  k  — •  2.  As  we  varied  the  num 
ber  of  components  in  the  spectrum,  the  logarithmic  spacing 
was  preserved;  that  is,  the  ratio  of  successive  components  in 
the  spectrum  was  constant  This  ratio  can  be  computed  from 
the  formula:  Ratio  =  101  -  n  We  used  the  values 

.1/  —  3,5,  11,21,41,  and  81.  For  example,  successiv  e  compo¬ 
nents  of  the  81 -component  waveform  had  ratios  of  1  041. 
and  the  nearest  components  to  the  1000-Hz  component  were 
961  and  1041  Hz.  The  standard  spectrum  was  always  fiat; 
that  is,  the  components  were  all  equal  amplitude  Bo‘h  a  sine 
and  cosine  variation  were  used.  For  the  three-component 
case,  the  rippie  was  simply  an  elevation  at  the  1000-Hz  cen¬ 
tral  component. 

Figuie  6  shows  the  data  as  a  function  of  St,  the  number 
of  components  in  the  spectrum  The  thresholds  clearl)  de¬ 
crease  as  the  number  of  components  increases  to  about  21. 
where  the  threshold  value  reaches  about  —  24  dB.  As  the 
mimV  •  4' components  is  increased,  the  function  appear-,  to 
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FIG.  9.  The  rms  signal  threshold  for  a  2 -cycle  ripple  as  a  function  of  the 
number  of  components  in  the  ripple  M  Circles  are  thresholds  for  sine  rip¬ 
ples  and  triangles  for  cosine  ripples.  The  square  represents  the  threshold  for 
a  three-component  flat  spectrum  with  an  increment  on  the  center,  1000- Hz 
component.  Error  bars  are  the  standard  errors  calculated  over  18  runs  for 
three  subjects. 


rise  only  slightly.  The  cosine  ripple  appears  to  produce 
slightly  poorer  detection  performance  for  all  numbers  of 
components,  but  the  difference  is  small.  A  similar,  but  much 
smaller  difference  between  the  sin  and  cosine  ripple  is  also 
apparent  in  Table  I,  at  the  lowest  frequency  of  ripple. 

The  results  obtained  with  our  21 -component  waveforms 
were  taken  with  a  sufficient  number  of  components  to  obtain 
sensitive  detection  performance.  For  our  frequency  region, 
200  to  5000  Hz,  we  suspect  the  ability  to  distinguish  a  flat 
spectrum  from  a  rippled  one  is  largely  independent  of  the 
number  of  components  used  to  define  the  spectrum,  as  long 
as  at  least  21  components  are  used.  The  small  increase  in 
threshold  evident  in  Fig.  9  for41  and  82  component  densities 
is  probably  the  result  of  the  lack  of  extensive  practice  at  those 
densities.  Most  of  the  observers’  training  in  this  sequence  of 
experiments  used  a  21 -component  complex.  Our  21  compo¬ 
nents  span  about  4.5  oct,  a  density  corresponding  to  about  5 
components  per  octave.  The  frequency  spacing  of  the  com¬ 
ponents  near  1(300  Hz  is  about  150 cycles,  approximately  one 
component  per  critical  band.  As  Fig  9  shows,  once  this  den¬ 
sity  is  achieved,  the  ability  to  detect  a  low -frequency  ripple  is 
essentially  independent  of  component  density. 


IV.  CONCLUSIONS 

These  studies  have  reported  measurements  on  the 
ability  to  detect  intensity  changes  in  an  equal-energy  (fiat) 
spectra.  The  intensity  changes  investigated  were  of  two 
types.  In  one  set  of  experiments,  the  change  in  intensity  is 
limited  to  a  narrow  frequency  region.  In  the  other,  the  inten¬ 
sity  changes  occur  over  the  entire  spectrum. 


For  the  changes  confined  to  a  narrow  frequency  region, 
the  frequency  at  which  the  intensity  change  is  produced  in¬ 
fluences  the  detectability  of  such  a  change  only  slightly .  The 
midfrequency  region  (500  to  2000  Hz)  appears  to  be  where 
the  smallest  changes  can  be  detected,  but  the  extreme  fre¬ 
quency  region  is  worse  by  only  a  few  decibels.  Changes  in 
spectral  shape  near  70u0  Hz  are  more  difficult  to  detect  than 
the  same  type  of  change  near  1000  Hz  by  about  12  dB  (Fig. 
4),  and  the  change  in  threshold  as  a  function  of  frequency  is 
gradual. 

The  psychometric  functions  for  a  change  in  the  intensity 
of  a  single  component  appear  to  be  the  same  w  liether  a  single 
or  multiple  components  are  present.  The  function  that  ap¬ 
proximates  the  form  of  the  psychometric  function  is  d  '  =  k 
(increment  pressure). 

For  intensity  changes  occurring  over  a  broader  frequen¬ 
cy  region,  the  ability  to  detect  a  ripple  over  the  range  from 
200  to  5000  Hz  was  essentially  independent  of  the  frequency 
of  the  ripple.  Sine  and  cosine  ripples  were  also  nearly  equal  in 
detectability. 

The  comparison  of  the  detection  of  broad  changes  ver¬ 
sus  narrow  changes  in  the  spectrum  revealed  an  anomaly. 
The  broader  changes  are  more  difficult  to  hear  by  some  5  to  7 
dB  than  one  would  expect  on  the  basis  of  a  simple  model  that 
integrates  the  detectability  over  the  separate  frequency  re¬ 
gions. 
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The  ability  to  discriminate  between  simultaneously  presented  100-Hz-wide  bands  of  noise  with 
envelopes  that  were  cither  similar  or  dissimilar  was  measured.  The  center  frequencies  of  the 
noise  bands,/,  and /,  -f  A/  Hz,  were  systematically  varied.  When  the  bands  of  noise  were 
separated  by  an  octave,  A f = /,  ,  discriminations  were  at  chance  levels.  For  frequency 
separations  less  than  an  octave.  A/ <fL,  discrimination  was  best  for/,  =  2500  and  4000  Hz, 
somew  hat  poorer  for/,  =  KXK)  Hz,  and  impossible  for /,  =  350  Hz.  Listeners  were  also  asked 
to  discriminate  between  bands  of  noise  with  envelopes  that  were  either  perfectly  or  partially 
correlated,  and  bands  with  envelopes  that  were  either  uncorrelated  or  partially  correlated.  The 
data  suggest  that,  when  transformed  to  an  equal-variai.ee  scale  ( Fisher's  z),  equal  changes  in 
Fisher's  z  lead  to  equal  changes  in  detectability,  regardless  of  the  correlation  of  the  envelopes 
of  the  reference  signal. 

FAC'S  numbers:  43.66.Nm,  43.66.Mk,  43.66. Fe  [WAY] 


INTRODUCTION 

Although  the  importance  or  envelope  synch-'  ny  has 
been  studied  extensively  in  the  binaural  domain,  the  impor¬ 
tance  of  monaural  envelop’-  synchrony  has  received  r  .Lit  ix  c- 
ly  little  attention  (see  (Joldstein,  1965,  for  a  historical  re¬ 
view).  Two  i  cent  exceptions  arc  the  study  of  Schubert  and 
Nixon  (  1970)  and  the  comodulated  masking  release 
(CMR)  studies  introduced  by  Halle/ it/.  (19,84). 

Schubert  and  Nixon  (  1970)  examined  the  detectability 
-'f  temporal  synchrony  for  bands  of  noise  of  different  center 
frequencies.  In  their  experiment,  low-pass  noise  was  multi¬ 
plied  by  two  earners,  one  at  350  Hz  and  a  second  one  be¬ 
tween  375  and  TOO  fir.  On  sort  trials,  the  two  bands  \vc re 
derived  from  a  single  source,  and.  on  the  remaining  trials, 
the  bands  were  derived  from  independent  sources.  Thus  lis¬ 
teners  were  asked  to  indicate  whether  the  simultaneously 
ptesenicd  noise  bands  were  synchronous  (single  generator) 
or  independent  ( independent  generators).  Listeners  were 
unable  to  make  the  diset imination. 

More  teecrit  experiments  have  suggested  that  listeners 
are  indeed  able  to  compare  envelopes  extracted  in  different 
frequency  regions.  For  example,  Halle/  al.  (1984)  showed 
that  the  masked  threshold  for  a  tone  in  a  band  of  noise  is 
r. duet'll  when  a  second,  temporally  synchronous  band  of 
noise  is  present  at  a  frequency  removed  from  the  signal.  ‘I  hcv 
referred  to  this  as  comodulation  masking  release,  or  CMR. 

I  he  extent  of  release  varies  from  a  few  df)  to  as  much  as  10 
cl M  depending  on  the  separation  in  center  frequencies  ol  the 
t  w  o  bands  of  noise.  Further,  CMR  is  observed  for  mon.uiral- 
ly.  tliotically.  and  dichotically  presented  stimuli,  and 
whether  or  not  the  second  noise  band  is  at  a  frequency  higher 
than  or  lower  than  the  noise  band  containing  the  signal  (Hall 
vt  al..  1 984;  MeFaddea.  1986,  Cohen  and  Schubert.  I'»S7). 

A  recurrent  issue  in  CMR  experiments  is  the  impor¬ 
tance  of  envelope  correlation  between  the  two  bands  of  noise. 
In  the  "noise  alm-c"  mteiv  al,  the  two  noise  hands  have  iden¬ 
tical  envelopes  In  the  “signal  plus  noise"  interval,  the  addi¬ 


tion  of  the  signal  to  one  of  the  noise  bands  degrades  the 
correlation  between  the  envelopes.  Thus  the  reduction  in  the 
masked  threshold  that  defines  CMR  may  reflect  the  dis¬ 
crimination  of  changes  in  correlation  associated  with  the  ad¬ 
dition  of  a  signal  to  one  of  the  two  noise  bands. 

fiiveu  the  experiment  of  Schubert  and  Nixon,  this  may 
seem  an  unlikely  explanation.  However,  Schubert  and  Nix¬ 
on  examined  the  discrimination  of  envelope  correlation  at 
relatively  low  frequencies  (350  Hz),  while  the  CMR  experi¬ 
ments  have  yielded  the  largest  effects  at  higher  frequencies 
(  1000  Hz  and  above).  It  seemed  reasonable,  then,  to  repeat 
the  experiment  of  Schubert  and  Nixon  at  higher  frequencies 
in  order  to  determine  whether  envelope  correlations  are  de¬ 
tectable  at  frequencies  where  CMR  is  measured. 

Experiments  1  and  2  examine  whether  or  not  noise 
bands  with  identical  envelopes  can  be  discriminated  from 
noise  bands  with  envelopes  that  are  statistically  indepen¬ 
dent.  A  large  portion  of  the  auditory  spectrum  was  examined 
in  order  to  determine  the  effects  of  frequency  region  and 
frequency  separation.  Experiment  3  examines  the  auditory 
systems'  sensitivity  tn  .-hanges  in  monaural  envelope  corre¬ 
lation.  Finally,  the  data  of  experiment  3  are  examined  in  an 
effort  to  determine  the  plausibility  of  the  proposal  that 
changes  in  envelope  correlation  are  responsible  for  the  re¬ 
duction  in  masking  observed  in  the  CMR  paradigm. 

I.  EXPERIMENT  1:  MONAURAL  ENVELOPE 
CORRELATION  PERCEPTION  AS  A  FUNCTION  OF 
CENTER  FREQUENCY  AND  FREQUENCY  SEPARATION 

A.  Procedure 

Listeners  were  asked  to  discriminate  between  two  sig¬ 
mas.  one  composed  of  two  noise  bands  with  identical  enve¬ 
lopes  (poind  signal ),  the  oilier  composed  of  statistically  in¬ 
dependent  noise  hands  (independent  signal).  For  both 
signals,  the  two  noise  hands  were  centered  at  frequencies /, 

( lower  center  frequency )  and  /,  4-  A / 
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Paired 


S'g.  a!  generation 

1  n  practice.  thc/.'c  iired  signals  were  the  sum  of  two  wave- 
f«.i  ms  nit)  at  d  ••••■*(/).  ciJi  of  which  is  the  sum  of  several 
cosines: 

m(/)=  V  a,  cos  [  1~{J, .-I-  &>)  i  +  0,  ],  (la) 

tc*(f)  =  £  "■  cos[ 2~{f,  +  A/ 4  Si)t  +  0,].  (lb) 

where  A  is  the  frequency  separation  of  the  tones  added  to 
generate  the  noise.//  is  the  center  frequency  of  the  lower 
hand,  and  A/  is  the  difference  between  the  low  and  high  cen¬ 
ter  frequencies.  Here,  S  was  10  Hz  and  ni  was  5.  yielding 
nominally  1(X)  Hz-wide  bands  of  noise.  The  independent 
hands  were  generated  similarly,  except  that  the  a,'s  and  0,’s 
of  Fqs.  tin)  and  (lit)  were  independently  chosen  rather 
than  being  identical.  Note  that,  for  the  paired  signals,  the 
noise  bands  are  uncorrelated  since  they  are  centered  at  dif¬ 
ferent  (orthogonal)  frequencies.  The  envelopes,  however, 
are  identical.  In  contra’ t,  the  1  amis  that  comprise  the  inde¬ 
pendent  signals  are  statistically  independent,  both  in  their 
envelope  and  waveform  characteristics. 

All  of  the  signals  used  were  genet  an  d  and  played  using 
an  HIM  l’C  microcomputer.  I  or  each  (/,./,  -1  A/)  fre¬ 
quency  pair  tested,  32  paired  noise  bands  were  generated. 
The  low-frequency  bands  were  generated  first.  The  ampli¬ 
tude  (o.  )  of  each  lone  was  chosen  at  random  from  Rayleigh- 
distributed  values  and  the  phases  (0,  )  from  values  uniform¬ 
ly  distributed  between  zero  and  2~.  Next,  the  associated 
high-freuu’.  ney  bands  were  geneiated.  Those  1  1  tones  had 
I  he  same  amplitudes  and  phases  as  their  low -frequency 
countet  parts,  but  I  he  frequencies  w  ere  increased  by  a  value 
of  A/  Hz. 

After  'he  32  paired  noise  bands  were  computed  and 
stored,  the  stimuli  used  in  the  experiment  were  selected  in 
the  following  manner.  First,  one  set  o(  paired  noise  bands 
was  chosen  at  random  and  the  two  bands  were  added  in 
order  to  produce  a  paired  signal  In  Older  to  generate  an 
independent  signal,  a  low  -  and  i  high-frequency  band  were 
chosen,  and  added  if  their  envelopes  were  not  the  same  (i.e., 
they  did  not  form  a  pair ) . 

Figure  1  illustrates  two  hypothetical  signals,  a  paired 
signal  indicated  on  the  top.  and  an  independent  signal  shown 
on  the  bottom.  Also  shown  are  the  amplitude  spectra  and  the 
waveforms  associated  with  the  two  noise  hands  that  arc  com¬ 
bined  to  make  the  signal  In  this  example./,  was  350  11/  and 
ft  -  A/' was  630  Hz.  Note  that  the  hands  of  noise  that  com¬ 
prise  the  paired  signal  have  identical  envelopes  and  the  same 
relative  amplitude  spectra. 

the  stimuli  were  played  thtonph  a  12-bit  D/A  convert¬ 
er  at  a  sampling  rate  of  14.3  kHz  and  low-pass  filleted 
I  Kernel  VHF/23 )  at  6  kHz.  The  signal  duration  was  UK)  ms, 
including  5-ms  cosine -squared  onsei/off.et  ramps. 


2.  Method 

In  one  interval  of  a  21FC  paradigm,  the  signal  was 
pat tt  d.  m  the  other  interval  I  he  signal  w  as  independent.  L.is- 
tvnors  indicated  which  of  the  live  intervals  contained  the 


Independent 
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rrequency 

M(i  I  Two  stimuli  arc  shown.  The  top  panel  presents  the  paired  noise 
bands,  whose  envelopes  are  identical  l  he  bottom  panel  shows  two  indepen¬ 
dent  noise  bands,  whose  cm  elopes  are  also  independent.  The  summed 
waveform,  amplitude  spectra.  >nd  the  waveforms  of  the  individual  bands  of 
noise  are  draw  n. 


paired  bands.  The  dependent  variable  was  the  percent  cor¬ 
rect  discriminations. 

Four  values  of/,  w  ere  tested:  350.  1000,  2500,  and  4000 
Hz.  Table  I  indicates  the  1 7  ( fL  ,/,  4-  A f  )  pairs  tested.  The 
difference  between  the  center  frequencies  of  the  two  roisc 
bands  is  in  let  m-  >f  If/.')  .  t  he  normalized  frequency  separa- 
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1  ABLE  I.  The  center  frequencies  of  the  bands  of  noise  used  in  experiment  1. 
The  left-hand  column  indicates  the  lower  center  f  requency  ft  ,  and  the  body 
of  the  table  indicates  the  associated  higher  center  frequencies  tested, 
f L  +  A/  Along  the  top,  the  relative  frequency  separation,  A  f /fL ,  is  tndit  at- 
ed. 


Rel  iive  frequency  separation  (A///,  ) 
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tion.  The  extent  of  separation  (A/)  ranged  between  100  Fiz 
(the  noise  bandwidth)  and  an  octave. 

Subjects  listened  in  a  sound-treated  room.  The  stimuli 
were  presented  diotically  via  Sennhciser  FID  414  SL  ear¬ 
phones  at  an  average  level  of  ti5  dB  SPL  per  component,  ora 
spectrum  level  of  55  dB  SPL.  The  signal  durations  were  100 
ms,  the  two  listening  intervals  were  separated  by  300  ms,  and 
both  intervals  were  indicated  on  a  display  screen.  i-bllovving 
the  subject’s  response,  feedback  was  displayed  for  240  ms. 
Conditions  were  tested  in  50  trial  blocks,  6  blocks  at  a  time. 
Thus  each  data  point  is  based  on  300  trials.  Subjects  typically 
completed  20  blocks  a  day  and  conditions  were  completed  in 
random  order. 

Listeners  were  undergraduate  students  paid  to  partici¬ 
pate,  except  GR  who  is  the  author.  All  had  normal  hearing. 
Listeners  first  heard  the  (2500,  2750)-Hz  condition.  After 
completing  the  first  few  sets  of  50  trials,  further  piactice  was 
not  needed.  Throughout  the  experiment,  several  conditions 
were  repeated.  In  no  case  was  an  effect  of  practice  evident. 
For  the  repeated  conditions,  only  the  last  .300  trials  contrib¬ 
uted  to  the  data  reported. 

B.  Results  and  discussion 

Figure  2  shows  the  percent  correct  identifications  as  a 
function  of  normalized  frequency  separation,  A///,  ,  be¬ 
tween  the  two  bands  of  noise.  Since  the  two  noise  bands  were 
centered  at  fL  and  fL  +  A /,  a  A///,  of  1  indicates  that  the 
two  bands  were  separated  by  an  octave.  Data  for  two  condi¬ 
tions,/,  =  2500  (circles)  and fL  =  4000  (triangles)  Hz, are 
presented.  The  data  for  each  of  the  three  subjects  are  plotted 
separately, 

When  the  low-frequency  noise  band  was  centered  ai  ei¬ 
ther  2500  or  4000  Hz,  performance  delerioiated  as  the  sepa¬ 
ration  between  the  two  noise  bands  was  increased.  When  the 
relative  separation  was  at  or  below  0.1,  performance  was 
nearly  perfect.  As  the  separation  approached  an  octave,  per¬ 
formance  approached  chance  levels.  Extrapolating  from  the 
averaged  data,  discriminability  reached  75 at  A/  //,  -0.3. 

Figure  3  shows  the  data  for  /,  =  350  and  1000  llz. 
Again,  the  parameter  is /,  .  These  results  differ  from  those 
observed  at  higher  frequencies  ( Fig  2).  When/,  was  1000 
Fiz,  peiformance  did  not  chungc  monotonically  with  A f 
Rather,  the  best  performance  was  observed  for  frequency 
separations  between  200  and  400  Hz..  When /,  was  3 50  Hz, 
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1G.  2.  Percent  correct  is  plotted  as  a  Junction  ol  t  dative  ire.  jiictK  \  >cp;»i  a 
tion,  -V//).  11k  lower  center  frequency  v*«is  ciltici  25')<>  i  nr  4.  ( d ; 

H/  Data  for  the  three  sut  jtvts  arc  p!  Uted  separately.  I.rtoi  bars  indicate 
one  standard  error  of  the  mean. 

listeners  performed  at  chance  levels.  The  350-11/  data  pro 
vide  a  replication  of  Schubert  and  Nixon's  i  l')7o )  findings. 

It  is  clear  that  listeners  are  aide  to  dc  tcct  w  he t her  ot  net 
simultaneously  presented  noise  hands  have  correlated  or  un¬ 
correlated  envelopes.  But,  as  has  been  pointed  out  by 
McFadden  (19b7),  wc  must  be  careful  not  to  envision  the 
proximal  stimulus  as  two  separately  tesolved  ha  ini'  of  noise 
Fot  example,  in  the  high-frequency  conditions  i /,  -  25m). 
40!X)  Hz),  perfonna.tce  was  h  st  w hen  tin  bands  ol  n.  e-e 
were  closest,  i.e.,  those  coiuli'i  ais  in  wlueii  it  leraclions 
within  a  single  critical  band  were  most  likely  to  occur. 

I  he  * . i  t  that  performance  is  poor  when  the  nos  hands 
aie  centered  below  1  ( ><  a )  11/  is  ^  on -i  stent  w  ith  the liy  po;  It  cm- 
that  temporal  envelopes  are  important  for  discrimination 
Our  results  are  in  line  with  those  of  Henning  l  I '/Ml,  and 
Henning  and  Ashton  (  1981  ).  w  ho  found  that  the  dcuv  tal  i! 
ity  of  the  internum!  delay  of  I  lit-  envelopes  ot  sinusoidally 
amplitude-modulated  tones  was  poor  unit!  the  center  lic- 
queney  of  the  sienal  exceeded  !<Kx>  11/  Farther,  It.  aMoi, 
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I  1 1  i  5  !Yfc*  n?  v.  '!  ? ;  !  i*.  ploi!» \1  a'i  :\  fiincH*  >n  relative  frequency  separa- 
I*  ’m.  A"  I  hi*  l.uvei  >.  'riser  fr  i  f,  )  Mas  dill*  r  35(5  ( ^ )  or  h.KX) 

.  .  \\t  Data  I'M  l Ik*  line.  subjects  are  p!  'tied  separate!) 

and  Trnhinlis  { 1  S 5 )  showed  that  the  envelope  delay  of  sin¬ 
usoidally  amplitude-modulated  signals  contributed  little  to 
intracranial  movement  v\  lien  the  center  frequency  of  the  sig¬ 
nal  was  smaller  than  l(XX)  II/.  but  that  envelope  delays  lead 
to  relatively  large  intracranial  movement  when  the  center 
frequency  exceeded  about  1000  II/. 

Unfortunately,  the  comparison  between  the  Unc¬ 
i'  /,  .'50.  1000  II/.)  and  high-t/,  -  2500,  4(XX)  H/)  fre¬ 

quency  regions  may  have  been  contaminated  by  the  use  of  a 
lived  noise  bandwidth.  When  the  center  frequencies  are 
large,  the  100-Hz  band  oi  noise  is  presumably  narrow 
enough  to  pass  through  a  single  critical  band.  At  lower  fre- 
qu  mcies.  however,  the  noise  bandwidths  may  have  exceeded 
critical  bandwidths.  Thus,  at  low  frequencies,  the  opportuni¬ 
ty  for  envelope  comparisons  may  have  been  limited  In  the 
tact  that  the  * •' 1 1 pu t  of  no  two  eiitieal  bands  had  identical 
••nvelopes.  Tor  this  reason,  the  experiment  was  repeated  at 
the  lowet  frequencies,  with  the  change  that  narrower  noise 
bends  w  are  Used. 

Additionally,  the  experiment  was  repeated  in  the  pres- 
A-n.yol  32  No  "  ‘;:,o"*pr  m<t7 


cnee  of  low-pass  noise.  This  control  w  as  included  in  an  effort 
to  reduce  the  subjects’  reliance  on  low  -frequency  distortion 
products  that  may  have  been  present  due  to  the  nonlinear 
nature  of  either  the  signal  generation  apparatus  or  the  audi¬ 
tory  system 

II.  EXPERIMENT  2A:  THE  EFFECT  OF  BANDWIDTH 

A.  Procedure 

The  procedure  was  nearly  identical  to  that  of  the  first 
experiment,  except  that  relatively  narrow  bandwidths  were 
used.  Two  frequency  regions  were  tested,/,  =  350  and  1000 
Hz.  When  the  low -frequency  band  was  centered  at  350  Hz, 
three  tones  made  up  the  noise  band,  yielding  a  nominally  20- 
Hz-wide  band  of  noise.  When  the  low -frequency  band  w  as 
centered  at  1000  Hz,  a  nominally  40-Hz-xvide  band  of  noise 
was  used.  These  bandw  idths  were  chosen  so  that  the  band¬ 
width  of  the  lower  noise  band  relative  to  its  center  frequency 
was  approximately  that  used  in  experiment  1  when  fL  was 
2500  Hz  ( i.e.,  a  relative  width  of  0.04).  As  in  experiment  1 , 
the  spectrum  level  w  as  set  at  55  dB  Sl’L.  The  same  listeners 
participated,  and  again  little  or  no  practice  was  needed. 

B.  Results  and  discussion 

Figure  4  shows  percent  correct  as  a  function  of  normal¬ 
ized  frequency  separation.  A///,  ,  of  the  two  bands  of  noise. 
The  comparison  of  interest  is  with  the  data  presented  in  Fig. 
3.  For  GN,  there  is  little  effect  of  reduced  bandwidth  for  any 
of  the  conditions  tested.  For  the  other  two  listeners,  ZM  and 
OR.  performance  at  the  smallest  frequency  separation  for 
J)  -  350  Hz  appears  to  have  exceeded  chance.  Otherw  ise, 
for/,  =  350  If/,  performance  was  unchanged.  In  the  KXX)- 
Hz  condition.  ZM  and  GR  performed  more  poorly  when  the 
narrower  bandwidths  were  used. 

Across  all  listeners  and  center  frequencies,  the  reduc¬ 
tion  in  bandwidth  did  not  lead  to  performance  levels  as  good 
as  those  found  at  higher  frequencies  (Fig.  2).  Thus  the  in¬ 
ability  to  detect  changes  in  envelope  correlations  at  relam e- 
ly  low  frequencies  does  not  appear  to  be  due  to  limitations 
imposed  by  the  bandwidth  of  the  noise. 

These  data  present  two  curious  interactions  for  which 
we  have  no  explanation  First,  the  hetween-listener  diifer- 
cnecs  ( ZM  and  GR  vs  GN )  are  quite  striking.  Second,  when 
there  is  an  effect,  reducing  the  noise  bandwidth  is  marginally 
advantageous  in  one  instance  (  fL  =  350  Hz),  and  detri¬ 
mental  in  the  other  (  /,  =  1000  H/  ). 

III.  EXPERIMENT  2B:  THE  EFFECT  OF  LOW-PASS 
NOISE 

A.  Procedure 

The  procedure  was  similar  to  that  of  the  first  experi¬ 
ment.  cxci  pt  that  low-pass  noise  was  continuously  present 
(noise  source:  Genera!  Radio  Company,  —13X2.  low-pass 
liller:  Kcmo  VBF/23  ).  This  experiment  was  included  in  or¬ 
der  to  dot ei  mine  whether  low- frequency  distortion  products 
were  affecting  the  disi intimation  of  envelope  correlations. 
1  he  spectrum  level  of  die  low -pass  noise  was  lived  at  35  dB 
M’l  .  and  the  cu(otv  frequency  was  located  approximatelv 
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noise  bands  had  handwidlhs  of  20  and  40  H/  for  /)  \  of  350  (•)  and  1000 
(  ▲  )  1 !/.  respective!)  Dali  for  (he  three  subjects  are  plotted  separate!) 


one-third  of  an  octave  below  the  lowest  component  of  the 
lower  noise  band.  At  the  frequency  of  the  lowest  component 
of  the  narrow  band  of  noise,  the  low-pass  noise  was  approxi¬ 
mately  60  dU  below  the  level  of  the  narrow  band  of  noise. 
Three  frequency  regions,/,  =  1000.  2500,  and  4000  Hz, 
were  tested.  In  this  experiment,  subjects  improved  with 
practice,  and  so  several  practice  runs  were  needed  in  order 
for  the  subjects  to  achieve  asymptotic  levels  of  performance. 

B.  Results  and  discussion 

Figure  5  shows  performance  obtained  when  low-pass 
noise  was  present.  Three  frequency  regions  were  tested, 
ft  =  1000,  2500,  and  4000  Hz.  Comparing  these  data  to  the 
data  of  experiment  1  ( Figs.  2  and  3 ),  it  can  be  seen  that  low- 
pass  noise  depresses  performance,  especially  when  the  per¬ 
formance  levels  had  been  high.  For  the  smaller  frequency 
separations  (  A/ //,  <0.4 ),  performance  was  on  avei  e  ■  !  1 
lower  when  low-pass  noise  was  present  Win  tin  r  the  reduc¬ 
tion  reflects  the  fact  that  the  added  low-pass  n.  i  e  inter  f  led 
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w  ith  the  lower  noise  band  or  the  masking  of  lumlmeai  distor¬ 
tions  is  not  obvious.  In  any  event,  information  introduced 
via  nonlinear  distortions  cannot  be  the  sole  cue  used  for  dis¬ 
crimination  since  scores  of  lHKr  were  obtained  even  when 
the  experiment  was  completed  in  the  presence  of  low  pass 
noise. 

IV.  EXPERIMENT  a:  DETECTING  CHANGES  IN 
ENVELOPE  CORRELATION 

This  experiment  differs  from  those  pivv  tousiv  described 
in  that  the  en  lopes  of  the  low  -  and  high-frequency  noise 
hands  were  no  longer  either  identical  l/a condition 
above )  or  independent.  I  list  earl,  noise  bands  with  env  c  lope  s 
that  were  partially  correlated  were  also  used  In  this  expci. 
inent,  the  function  relating  changes  in  enve  lope  co;  i clui n -n 
aiul  discriminabihty  was  d,  tcrmiiiej .  4  o  ihm  end  i.;;,. 
lilsrrililili.ited  between  sign  |F  C. i.npo'.xl  ,T  I-..:  .1-  ; 

noi-.e  vv  hose  e-uv  e  lope  s  vvei  e  either  |m:  li  d'v  c  i , .  la:..i  .  i  ;n 


dependent  and  noise  I'.;!*!  whose  envelopes  were  either  par¬ 

ti  1  •  I \  or  ftlllv  col  ’  l  i  lt*  :1. 


A.  Proctdui  e 

1.  Signal  generation 

'1  he  partially  correlated  stimuli  were  created  in  a  man¬ 
ner  similar  to  the  “three  genet ator  case"  described  by  I.ick- 
hder  and  Pzendolct  ( 1948;  see  also  Jclfrcss  and  Robinson, 
1962).  1  he  combination  algorithm  used  to  generate  the 
stimuli  was  as  follows.  First,  32  paired  noise  bands  were 
computed  |  !\|s.  ( la  t  and  ( lb)  ]-  As  before,  \\\e  paired  bands 
had  id -mica!  envelopes.  Adding  high-  and  low-frequency 
bands  that  did  not  form  a  pair  yield  i  stimuli  whose  enve¬ 
lopes  were  independent. 

(  lie  stimuli  whose  envelopes  w.,c  only  partially  corre¬ 
lated  were  generated  by  combining  four  noise  bands,  two 
paired  I-  u;.K,  irlt)  and  «-*(/).  and  two  independent  bands, 
r/(  t )  and  c*  ( t ) : 

.Si. !  —  <r  |  tel  M  ‘  »■*(/)]  ■  /J\u(t)  +-(•*(/)  J.  (2) 

In  order  it'  achieve  a  particular  correlation,  the  paired  high/ 
low  bands  were  malt  ipltc  1  by  one  constant  ( a ).  and  the  in- 
de  h-ndenl  high  low  bauds  were  multiplied  by  a  second  con¬ 
stant  (/?).  I'he  constraint  that  a  (i  —  1  ensured  that 
stimuli  whose  envelopes  were  partially  correlated  retained 
the  same  avera.’.e  level  as  the  primary  signals. 

I  he  correlation  cH’flivients.  which  depend  on  a  and  ft, 
were  computed  empirically.  1  hey  repri ■-•-nt  the  correlation 
of  the  icstiliant  high-  and  low  -frequency  envelopes,  not  the 
correlation  of  the  high-  and  low -frequency  w  aveforms 
(which  is  zero  due  to  the  different  center  frcqucitsies).  In 
order  to  evaluate  the  cot  relations.  1(X)  simulations  were 
completed.  Waveform,  were  computed  according  to  Eqs. 
t  I  the  high-  and  low-frequency  envelopes  were  ex¬ 

tracted.  and  the  Pearson  moment  correlation  cocllicient  ( r) 
was  computed.  1  lie  eo:  relation  coelf.cicnts  were  then  trans¬ 
formed  to  normally  distributed  r  scores  according  Fisher's  r 
to  r  tiai’sfonnalion,  c  l'2|ln(l  1  r)  —  ln(  1  --  r)  | 
(  McNemar,  1969  ).  The  resulting  100  Fisher's  z  scores  were 
then  av  eraged,  and  the  average  value  transformed  bach  to  r. 
The  resulting,  averaged  correlation  coefficients  are  used  to 
identify  the  stimuli.  This  somew  hat  arduous  process  was  fol¬ 
lowed  so  that  a  single  r  could  be  used  in  order  to  ’dentify  the 
stimuli. 

2.  Method 

In  a  2II  C  paradigm,  listeners  discriminated  between 
dioticallv  presented  signals,  each  composed  of  two  100-Hz- 
wide  noise  bands.  The  low-frequency  ( f,  )  noise  band  was 
centered  at  2500  Hz  and  the  high-freonenev  (/,  4-  AT/ 
band  was  centered  at  2750  Hz.  I  he  change  in  envelope  cor¬ 
relation  was  the  independent  variable,  and  percent  correct 
discrimination  was  the  dependent  variable. 

One  of  the  listeners,  GR,  had  participated  in  experi¬ 
ments  1  and  2  prior  to  this  experiment.  The  olhots  were  naive 
coui  crnmg  litis  type  of  experiment,  but  had  participated  in 
other  auditory  experiments  As  ON  was  unable  to  complete 


the  experiment,  VF  was  recruited  midway  through  the  ex¬ 
periment.  All  naive  listeners  began  with  the  correlated  ver¬ 
sus  imcorrelated  condition,  and  data  collection  began  after 
approximately  150  practice  trials.  Conditions  were  occa¬ 
sionally  repeated,  and  there  was  no  evidence  of  improvement 
throughout  the  experiment. 

B.  Results  and  discussion 

Figure  6  shows  the  percent  of  correct  discriminations  as 
a  function  of  r  for  the  condition  in  which  one  interval  con¬ 
tained  bands  of  noise  whose  envelopes  w  ere  independent  and 
the  other  interval  bands  with  envelopes  that  were  partially 
correlated.  Figure  7  presents  the  data  for  the  fully  versus 
partially  correlated  condition.  Extrapolating  from  the  data 
of  big.  6.  w  e  see  that  a  change  in  correlation  of  approximate¬ 
ly  0.6  is  expected  to  be  needed  in  order  to  discriminate,  with 
75ft  accuracy,  those  noise  bands  with  partially  correlated 
envelopes  from  those  with  uneorrelated  envelopes.  In  con¬ 
trast,  ,t  change  in  correlation  of  only  0.15  is  expected  to  be 
required  in  order  to  discriminate,  with  7577  accuracy,  be¬ 
tween  noise  bands  of  partially  versus  fully  correlated  enve¬ 
lopes. 

1  he  difference  in  correlation  needed  to  detect  a  change 
from  fully  correlated  ( Ar~0. 15 )  as  compared  to  the  change 
needed  to  detect  a  change  from  zero  correlation  (Ar~0  6) 
does  not  imply  unequal  sensitivity.  Consider  the  theoretical 
probability  density  function  for  correlation  coefficients 
shown  in  the  top  portion  of  Eig.  8.  These  distributions  as¬ 
sume  that  the  underlying  population  is  bivariate  normal 
(Hivkcl  and  Doksum,  1977).  The  histograms  observed  for 
our  stimuli  arc  also  indicated.  When  r  is  near  zero,  the  sam¬ 
pled  distribution  is  symmetric  and  roughly  bell  shaped.  As  r 
increases,  the  distributions  become  increasingly  skewed 
since  values  of  r  larger  than  1  cannot  occur.  Finally,  when 
the  population  correlation  is  1,  there  is  no  variance  since  any 
draw  will  result  in  a  correlation  of  1. 

In  order  to  consider  the  auditory  system's  sensitivity  to 
changes  in  correlation,  a  random  variable  whose  variance  is 
independent  of  its  mean  is  desirable.  The  Fisher's  rtoz  trans- 
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FIG.  7  Percent  correct  is  plotted  us  a  function  of  the  correlation  of  t  fie  HO.  7  Aserage  percent  cot  re  i  is  plotted  as  a  function  of  A  Fishet's  :  rail, 
partially  correlated  signal.  The  envelopes  of  the  standard  noise  bands  were  erthanr.  (  a  )  represents  a  ickvencecorrcluttnuofoncaii.l  represents 

fully  correlated.  Data  for  VF  (V).  GR  (3).  and  MS  tQ)  are  shown.  a  reference  correlation  of  zero 


form,  indicated  in  Fig.  8,  achieves  this  goal  Fisher’s  z  is  a 
normally  distributed  random  variable,  with  a  variance  that 
is  independent  of  its  mean,  z.  (Simulations  indicate  that  the 
transform  may  be  reasonably  applied  to  noise  envelopes, 
even  though  the  underlying  distributions  are  Ray  leigh  rath¬ 
er  than  normal.)  Following  the  signal  detection  approach, 
we  assume  that  the  observer's  performance  is  dictated  by 
changes  in  d‘.  It  follows  that  detecting  equal  changes  in  z 
should  yield  equal  discriminability.  regardless  of  the  stan¬ 
dard.  Thus  one  may  expect  that  transforming  the  data  of 
Figs,  h  and  7  from  /-to  Az  should  lead  to  more  similar  curves. 

Figure  9  presents  the  data  of  Figs,  o  and  7  on  a  single 
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FIG.  8.  The  distribution  of  correlation  coefficients  U  indicated  on  tile  r  a  ms 
The  (ran- formed  distributions  ate  indicated  oil  t her  axis  Solid  lines  repre¬ 
sent  theoretical  distributions,  and  dashed  lines  are  ilu-ib'ei  red  iiisingrains 
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abscissa,  A  Fisher’s  z.  This  figure  was  derived  in  the  follow¬ 
ing  manner.  1  irst,  /•’ s  w  ere  transformed  loz’s.  1  o;  ihe  case  in 
vs  Inch  the  reference  signal  had  uncort elated  noise  hand' .  Az 
is  simply  the  z  associated  with  the  partially  com  1  tied  signal 
(i.e.,  the  z  associated  with  an  r  of  0  is  zero.  v> 
Ac  z  —  0  =  z).  Next,  consider  the  case  m  which  the  refer¬ 
ence  signal  had  pet  feet  ly  cone!. tied  envelopes.  In  theory,  an 
r  of  1  is  transformed  to  a  z  of  inlimiy.  Here,  a  n<  tiimtiiiiie  z 
was  chosen  so  that  ! he  percent  correct  identifications  were 
b-.t  predicted  (according  to  the  lead  squares  cuieri.  .1) 
Tile  z  so  determined  had  a  value  of  1.9  an  I  accounted  for 
some  9SG  of  (he  data’s  variance  Thus  li  e  Az  aswiciaicd 
with  the  partially  correlated  envelopes  v  as  taken  to  be 
1.9  z.  Note  that  only  the  z  associated  with  an  r  of  I  ws 
altered  (changes  in  other  nonzero  values  were  so  small  as  to 
be  insignificant).  Mapping  an  r  of  1  to  a  z  of  1.9  may  be 
viewed  as  a  measure  of  the  internal  jitter  or  noise  associated 
with  the  auditory  ami  Air  decision  making  system. 

Figure  9  is  based  on  the  averaged  data  of  subjects  OK 
and  MS  (the  only  two  that  participated  in  both  portions  1  f 
this  experiment ) .  When  performance  is  ploin  J  as  a  I  unci  ion 
of  Az.  the  curves  arc  essentially  the  suite.  1  hits  setiMiiv  tiy  in 
changes  in  envelope  correlation  appear  to  be  independent  of 
the  reference  correlation,  piov  ided  the  data  are  piociiteJ  m 
terms  of  Ac  lathei  than  r. 

In  the  Appendix,  the  binaural  data  of  I’ollack  and  1  ritti- 
poe  ( 1959)  and  Gabriel  and  Colbui  11  ( 1981 )  arc  considered. 
In  their  studies,  subjects  indicated  differences  in  ihe  correla¬ 
tion  of  noises  presented  to  the  two  ears.  1  lie  Appendix  pie- 
sents  these  data  in  terms  of  changes  in  Fisher's  e  rather  than 
changes  in  correlation.  In  general,  the  binaural  data  Jo  not 
conform  to  the  "equal  changes  in  Fisher’s  z  leads  to  equal 
discriminability”  rule  to  the  extent  that  the  current  data  do 

V.  GENERAL  DISCUSSION 
A.  Comparisons  with  CMR 

Having  established  that  changes  in  envelope  correlation 
are  diseriminahle,  we  may  address  the  apparent  contradic¬ 
tion  between  the  Jala  of  Schubert  and  Nixon  (  19~iti  and  ihe 
CMR  data  of  il.dl  c/  ji.  (19S4)  ami  others.  It  -cans  ih.i! 
Schubert  and  Nixon  failed  1-  demoin-ir.de  u.-.ei imui.ii'iiiiy 
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because  t lies  tested  onF,  low-frequency  regions  (350  II/). 
U  e  have  shown  that  changes  in  envelope  correlation  are  iti- 
d  a!  detectable,  but  it  remains  to  be  shown  that  envelope 
correlations /ware  are  responsible  for  the  reduction  in  mask¬ 
ing  ohseived  in  the  ('MR  paradigm. 

These  experiments  are  not  sufficient  to  resolve  this  is¬ 
sue.  The  data  of  experiment  3  may,  however,  be  used  in  order 
to  examine  the  plausibility  of  such  an  argument.  Based  on 
Green  et  ul.  (1964),  a  2500-IIz  signal  in  noise  should  be 
detected  when  an  F.  /.Y„  of  approximately  1 2  dB  is  used.  Ta¬ 
ble  II  presents  both  Fisher's  c  scores  and  correlation  coeffi¬ 
cients  for  the  case  in  w  hich  a  tone  is  added  to  one  of  two  noise 
bands  whose  envelopes  arc  otherw  ise  identical.  The  tone  w  as 
added  to  a  100-lIz-wide  band  of  noise  centered  at  2500  Hz. 
Several  values  of/.'/.Y,,  were  considered,  changes  in  values 
having  been  achieved  bv  changing  the  level  of  the  added 
2500-11/  lone  1'hej  scores  are  hased  on  100  computer  simu¬ 
lations.  Also  indicated  are  the  observed  standard  deviations 
and  the  expected  pc  u  cut  correct  for  each  E  /.V„  considered. 
riu.se  predictions  ate  ive  ed  on  the  data  of  Fig.  2. 

As  is  cle  o  to  'in  fab!.  II.  an  E  /.Y„  K  tween  0  and  5  dB 
should  be  stilbcient  to  detect  llie  presence  of  a  2500-Hz  sig¬ 
nal  if  (Selection  is  based  o:i  changes  in  envelope  correlation. 
The  stima'cd  signal-to-noise  rat'd  is  about  10  cl  13  below  the 
expected  thicshold  lot  the  detection  of  2  500- Hz  tone  ,n 
noise  I  his  drop  m  /•'  A  .  about  10  vIB.  is  in  lute  with  the 
charge  in  i)m --hold  of  aSmt  5  to  10  dB  expected  based  on 
the  CMR  experiments  of  (  ohen  and  Schubert  ( 1°87'.  who 
u sal  somewhat  similar  stimulus  parameters. 

These  rough  estimates  indicate  that  it  is  reasonable  to 
that  envelope  cot  relation  and  CMR  expci  intents  re¬ 
tie  ct  similar  processing,  namely,  the  diseriminability  of 
changes  in  envelope  correlation.  1  he  generality  of  this  argu¬ 
ment  is,  however,  limited.  At  least  two  recent  studies  (Hall 
ct  a/..  1‘tS".  Sehoonevcldl  and  Moore.  1987)  have  demon¬ 
strated  that  CMR  may  be  obtained  at  low  frequencies.  In 
contrast,  the  current  data,  and  those  of  Schubert  and  Nixon 
l  1470).  establish  that  changes  in  envelope  correlations  are 
net  delectable  at  low  ftequences.  This  is  one  area  in  w  hich 
envelope  cm  relation  discrimination  and  CMR  are  at  odds. 
Whether  this  dilfeience  reflects  a  difference  between  CMR 
and  envelope  correlation  for  noise  bands  centered  at  both 
high  and  low  frequencies  remains  to  be  determined. 


I  AH!  I  II  I  m  Hti;i,c's  i>t  the  ;,nvcl'ipc  cor  reunion  Km  ween  two  hnmis  of 
noise  whose  cn\ dopes  wonld  he  identicnl.  except  th.it  n  tone  is  added  to  one 
rf  the  Kinds.  Kweial  x.ilti-.  •*  »;J  /:'>  A',,  were  simuhiteil.  and  the  resulting 
1  'she;  \  .*  st  •  **es  i.'i.  ohscr\ etl  standard  dc\ iaticMi  of-  Mores  ( v.  ).  and  cor- 
rdations  •  r  i.  are  show n  The  c\p»-  ted  per  for niant.e  levels  [  correct  >  ] . 

vi  rntd  from  lie  at-  also  indicated 
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VI.  CONCLUSIONS 

These  experiments  lead  to  four  conclusions. 

(1)  There  is  a  frequency  region  ( f,  .A/)  for  which 
changes  in  envelope  correlations  can  be  detected  and,  in  this 
region,  performance  depends  on  A f.  In  general,  as  the  sepa¬ 
ration  between  the  bands  of  noise  is  increased,  performance 
levels  drop,  approaching  chance  as  the  frequency  separation 
approaches  1  oet. 

(2)  The  discrimination  of  noise  envelopes  is  not  derived 
solely  from  combination  tones  introduced  by  nonlinearities 
in  the  auditoi )  system. 

(3)  The  discrimination  of  changes  in  monaural  enve¬ 
lope  eon  elation  appears  to  be  independent  of  the  standard  or 
reference  correlation. 

(4)  If  CMR  is  considered  in  terms  of  the  changes  in 
envelope  correlation  that  are  concomitant  with  the  addition 
of  a  tone  to  one  of  two  noise  bands  whose  envelopes  are 
otbe;  wise  identical,  the  current  data  predict  reasonable  val¬ 
ues  for  the  release  in  masking  measured  for  bands  of  noise 
centered  at  high  frequencies.  In  contrast,  CMR  may  be  ob¬ 
tained  using  low  -frequency  bands  of  noise,  but  envelope  cor¬ 
relations  do  not  appear  to  be  extracted  at  low  frequencies. 
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APPENDIX 

Here,  thebinauial  cot  relation  data  of  Pollack  and  Trit- 
tipoe  ( 1959)  and  Gabriel  and  Colburn  ( 1981 )  will  be  con¬ 
sidered.  In  their  experiments,  Pollack  and  Trittipoe  (  1959) 
asked  listeners  to  discriminate  changes  in  the  interaural  cor¬ 
relation  of  w  ideband  noise.  For  the  ease  in  which  the  refer¬ 
ence  correlation  was  0,  a  change  in  correlation1  of  0.4  was 
needed  for  the  subjects  to  perform  at  a  level  of  75Pf  correct. 
When  the  reference  correlation  was  1,  a  change  in  correla¬ 
tion  of  0.04  was  needed  for  the  subjects  to  perform  at  a  level 
of  75r7  correct.  Although  the  authors  mention  calculations 
concerning  changes  in  Fisher's  the  extent  of  evaluation  is 
not  indicated  Figure  A  1  presents  their  data  against  the  mea¬ 
sure  of  A  Fisher's c.  Although  their  study  indicates  superior 
sensitivity  ( an  interaural  correlation  of  1  was  mapped  to  a  c 
of  2.4).  the  function  relating  percent  correct  to  changes  in 
Fistic's  is  quite  similar  to  ours  (  Fig.  9 ) .  In  general,  equal 
i  Manges  ml  )-h>.-r's ."  lead  to  equal  diseriminability.  It  should 
tie  noted  lli.it.  for  these  binaural  studies,  the  reported  corre¬ 
lation  cocUF  lent,  are  in  terms  of  whole-waveform  correla¬ 
tions  rath  a  tiiai1  envelope  correlations,  and  that  waveform 
cot  relation  -at  e  larger  than  correlations  based  on  envelopes 
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FIG  A I .  As  Fig.  9  exempt  that  the  data  are  taken  from  the  iliehotie  expert 
merit  of  Pollack  and  Trittinne  (  19*91  Again.  {  j  }  a  i-efeitmt- 

correlation  of  1  and  (O)  indicates  a  reference  correlation  oft.).  Several  other 
reference  correlations  are  included.  0.  14  (.A).  0.5  (  *  i.  U.7nt  •  t.  anil  u  405 
(  ■  ).  The  r's  are  based  on  whole-  waveform  correlations. 1  not  the  correla- 
tums  between  envelopes. 


alone  (except  at  r  =  0  and  r  —  1,  where  \v hole  w  avefomt  and 
envelope  correlations  are  the  same). 

Figure  A2  plots  data  presented  by  Gabriel  and  Colburn 
(1981;  subject  DO).  Again,  percent  correct  is  plotted  as  a 
function  of  A  Fisher's  z.  Gabriel  and  Colburn  used  three 
stimulus  conditions:  noise  bands  centered  at  500  Hz  that  had 
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Fl(i  AO  taken  from  Gabriel  ami  Co.  irn  ( ins  i .  I-n»  .1)  |\-i.  ,  at  corus  i 
discntmnations  is  plottevl  as  a  funclion  of  Ac  1  ao  ref-rence  ,..ircl.i!i  us 
were  used.  1  ( top  panel ;  and  0  I  miitom  j 1  met 1.  Ilncv  vin  iuiticallv  prewnt 
ed  sun. u!t  were  used,  two  n. mow  hand  noise  sienah  centeied  at  M 11  lb. 
hatidvv  id  tits  of  3  {  V  i  .1 1 .  et  lie  I  1 1/,  and  a  4  "*-i  Ur  iow  |  ,.|s  .  uoiv  6 
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bandwidths  of  either  3  or  1  15  Hz  and  a  4 G-kllz  low  pass 
noise.  The  reference  correlations  w  ere  either  (i  ,,t  1 .  The  top 
graph  of  Fig.  A2  is  for  the  condition  in  which  the  reference 
correlation  was  1.  and  the  bottom  graph  the  condition  m 
which  the  rcfeiciice  correlation  was  0.  When  tlic  reUTcT.cc 
correlation  was  1 ,  changes  in  correlation  wflJ.OOo.  0  ill  5.  and 
0.04  yielded  75'/h  correct  discriminations  fot  the  stimulus 
conditions  of  3  Hz.  115  Hz,  and  4  7  kHz.  respectively  For 
the  same  conditions,  changes  in  I  ishci  5;  of  0.9,  (1.4  and 3, 
respectively,  were  needed.  When  the  reference  correlation 
was  0.  changes  m  correlation  of  0.7,  l)  4,  and  0.28.  respec¬ 
tively,  yielded  7 5 Gv  correct  discriminations.  In  these  condi¬ 
tions,  changes  in  Fisher's  z  of  0.95.  O  S.  ami  U  ti,  respective¬ 
ly,  were  needed. 

The  reader  should  recognize  the  unusual  finding  that, 
when  the  reference  correlation  was  1.  increased  bandwidth 
led  to  poorer  performance.  The  fact  that  the  change  n. 
Fisher’s  z  is  monotonic  with  bandwidth  does  not  relied  a 
quick  fix  of  this  situation.  1’  ithcr.it  is  a  tnamfcsi..tion  oi  the 
model's  failuie  to  adequately  describe  the  data:  Contrary  to 
the  theoretical  expectation  of  equal  variance,  the  estimated 
distributions  in  the  1  1 5-1  Iz  and  4.7-hI  1/  conditions  had  v  at  - 
iancesthat  depended  on  the  reference  signal's  correlation.  In 
Fig.  A2.  the  dilfeteiice  in  variance  is  manifested  as  a  change 
in  slope,  the  slope  being  shallower  (larger  variance)  when 
the  reference  correlation  was  1.  Only  in  the  5-11/  couditCn 
do  equal  changes  in  Fisher’s  z  lead  to  equal  di  .ci  iminahiliiv 
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The  ability  of  listeners  to  discriminate  between 
simultaneously  presented  bands  of  noise  whose  envelopes  were  either 
the  same  or  statistically  independent  was  determined.  Bands  of  100- 
Hz  wide  noise  were  employed  which  had  low  and  high  center 
frequencies  of  (2500,2750),  (2500,3000),  (2500,3500)  and 

(4000,4400)  Hz.  Average  discriminations  were  above  90  percent 
correct  except  for  the  (2500,  3500)  Hz  condition,  which  yielded  an 
average  of  77  percent  correct.  Next,  a  factorial  stimulus  design 
was  employed  in  order  to  determine  the  relative  importance  of 
envelope  and  power  spectrum  cues.  The  results  indicate  that  in  the 
absence  of  power  spectrum  cues,  bands  with  the  same  envelopes  could 
be  discriminated  from  bands  with  statistically  independent 
envelopes.  When  the  envelopes  were  always  the  same,  listeners  were 
able  to  discriminate  between  power  spectra  that  were  either  the 
same  or  different.  In  contrast,  when  the  envelopes  were  always 
different,  listeners  were  unable  to  discriminate  between  the  same 
and  different  power  spectra. 

Key  Words:  Comodulation  masking  release.  Envelope,  Power  spectrum 
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INTRODUCTION 

Hall,  Haggard  and  Fernandes  (1984)  showed  that  the 
detectability  of  a  tone  in  a  band  of  noise  is  improved  when 
synchronous  bands  of  noise,  rather  than  independent  bands  of  noise, 
are  presented  at  frequencies  removed  from  the  critical  band 
containing  the  signal.  This  reduction  in  masking,  termed 
comodulation  masking  release  or  CMR  (see  also  McFadden,  1986;  Cohen 
and  Schubert,  1987a) ,  indicates  that  the  auditory  system  is 
sensitive  to  temporal  synchrony.  Recent  experiments  (Cohen  and 
Schubert,  1987b;  McFadden,  1987)  have  demonstrated  that  temporal 
synchrony  affects  the  detectability  of  not  just  tones,  but  the 
detectability  of  bands  of  noise  as  well. 

In  an  experiment  motivated  by  the  CMR  paradigm,  Richards 
(1987)  asked  subjects  to  discriminate  between  simultaneously 
presented  100  Hz  wide  bands  of  noise  whose  envelopes  were  either 
the  same  or  statistically  independent.  Discriminations  were  not 
possible  unless  the  center  frequency  of  the  bands  of  noise  exceeded 
1000  Hz.  For  center  frequencies  above  1000  Hz,  discriminations  were 
above  chance  when  the  two  bands  of  noise  were  separated  by  less 
than  an  octave.  Richards  argued  that  for  high  frequencies,  the 
auditory  system  is  able  to  compare  information  contained  in  the 
envelopes  of  diotically  presented  bands  of  noise. 

This  conclusion  does  not  take  into  account  the  possible  cues 
derived  from  the  power  spectra  of  Richards'  stimuli.  Figure  1 
presents  Richards'  stimuli  (from  Richards,  1987).  When  the  two 
bands  of  noise  had  the  same  envelopes  (left  panel) ,  the  power 
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spectra  were  also  the  same  (save  for  a  shift  of  frequency) . 
Likewise,  when  the  bands  had  statistically  independent  envelopes 
(right  panel) ,  the  power  spectra  had  phases  and  amplitudes  that 
were  independent.  Thus,  the  correlated  vs  uncorrelated  envelope  cue 
co-varied  with  a  similar  vs  dissimilar  spectral  cue.  The  same 
argument  may  be  made  concerning  the  CMR  experiments;  spectral  cues 
are  not  considered. 


-  Figure  1  - 

There  are  at  least  two  ways  in  which  spectral  cues  might  be 
have  been  incorporated  in  Richards'  experiment.  First,  drawing  from 
the  work  of  Green  and  his  colleagues  (Green,  1S88),  one  might 
hypothesize  the  existence  of  a  'mini-profile  analyzer' .  Such  a 
mechanism  would  compare  the  shape  of  the  spectra  of  the 
simultaneously  presented  bands  of  noise.  It  seems  unlikely  that 
such  a  mechanism  was  available  in  Richards'  experiment  since  the 
100  Hz  wide  bands  of  noise  were  narrow  relative  to  the  high- 
frequency  critical  bands.  A  second,  seemingly  more  likely  strategy 
would  be  to  compare  the  total  energy  of  the  two  bands  of  noise. 

When  the  two  bands  of  noise  had  identical  envelopes,  the  total 
energy  in  each  spectral  region  was  equal.  In  contrast,  when  the 
bands  of  noise  were  independent,  the  total  energy  of  the  two  bands 
of  noise  differed.  Thus,  a  gross,  simultaneous  energy  comparison 
would  afford  an  opportunity  to  discriminate  between  correlated  and 
uncorrelated  bands  of  noise. 
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The  current  experiment  was  designed  to  determine  the  relative 
importance  of  the  envelope  and  power  spectrum  cues.  To  this  end,  a 
factorial  stimulus  design  was  employed.  The  four  stimuli  are 
indicated  in  shorthand  along  the  top  and  side  of  Table  1.  Each 
waveform  is  the  sum  of  two  bands  of  noise.  'E'  indicates  that  the 
two  noise  bands  had  the  same  envelopes,  and  'S'  indicates  that  the 
two  noise  bands  had  the  same  power  spectra  (save  for  the  shift  in 
center  frequency) .  A  ,  — '  indicates  the  complement:  'E'  indicates 
that  the  two  noise  bands  had  different  envelopes  and  'S'  indicates 
that  the  two  noise  bands  had  different  power  spectra.  Using  this 
notation,  the  four  possible  stimuli  are: 

ES :  The  two  bands  of  noise  had  the  same  envelopes  and  the  same 
power  spectra. 

ES:  The  two  bands  of  noise  had  the  same  envelopes,  but 
different  power  spectra. 

ES:  The  two  bands  of  noise  had  different  envelopes  but  the 
same  power  spectra. 

ES:  The  two  bands  of  noise  had  different  envelopes  and 
different  power  spectra. 

In  a  2IFC  paradigm,  listeners  were  asked  to  discriminate 
between  two  of  the  stimuli  described  above.  All  possible  stimulus 
comparisons  were  tested,  yielding  six  experimental  conditions. 

These  are  represented  by  the  cells  of  Table  1.  For  example, 
discriir  inating  between  ES  and  I?S  waveforms  (lower  left  hand  corner) 
is  the  condition  examined  by  Richards?  in  one  interval  of  the  2IFC 
presentation  the  bands  or  noise  were  the  same  (ES) ,  and  in  the 
other  interval  the  two  bands  of  noise  were  statistically 
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independent  (E 5 ) .  Note  that  the  ES  and  EE  conditions  are  analogous 
to  the  'correlated'  (sometimes  called  'coherent')  and 
'uncorrelated'  (sometimes  called  'independent')  cue  conditions 
often  used  in  CMR  experiments  (McFadden,  1986;  Cohen  and  Schubert, 
1987)  . 


Table  1 


I.  Stimuli 

The  stimuli  will  be  referred  to  as  described  above;  ES,  ES, 

ES,  and  ES.  Each  was  the  sum  of  two  bands  of  noise,  which  were 
computer-generated  in  accordance  with  the  following  equations. 

1.  ES 

The  ES  bands  of  noise  are  as  presented  in  the  left  panel  of 
Figure  1.  Each  of  the  two  bands  of  noise  is  the  sum  of  several 
cosines; 

:  m 

w1(t)  =  £  a^*  cos  (2  *  7r  *  ( ff  +  6  '  i)  *  t+<p±)  (la) 

i  i=-m 

m 

w2(t)  =  £  a^  ’  cos  (2  ’tt  *  (  fj+Lf+S  "  i)  ’  t+</>^)  ,  (lb) 

i=-m 

where  S  is  the  frequency  separation  of  the  tones  added  to  generate 
the  noise  and  Af  is  the  difference  in  center  frequency  between  the 
two  bands  of  noise,  v1(t)  and  w2(t).  Here  6  was  10  Hz  and  m  was  5, 
yielding  a  band  of  noise  nominally  100  IIz  in  width.  The  amplitudes 
(a^)  were  chosen  at  random  from  Rayleigh-distributed  values,  and 
the  phases  (<^>^)  were  chosen  from  values  uniformly  distributed 


7 

between  zero  and  2ir .  Again,  ES  bands  have  the  feature  that  the 
envelopes  and  power  spectra  are  the  same. 

2.  ES 

The  ES  bands  of  noise  have  the  same  envelopes  but  different 
power  spectra.  In  practice,  the  ES  stimuli  were  created  using  the 
same  stimulus  parameters  described  in  Eqns.  (1),  but  the  amplitudes 
were  reversed  and  the  phases  were  reversed  and  multiplied  by  -1. 

m 

w-^t)  =>  2  aj, 'cos(2 '7T*  (fL+(S‘ i)  *t+0^)  (2a) 

i=-m 

m 

w2(t)  =  2  a_^*  cos  (2  *  n *  ( f  L+Af+£  •  i)  *  t-<p_^)  .  (2b) 

i=-m 

It  should  be  noted  that  the  power  spectra  are  not  statistically 
independent,  but  uncorrelated  (one  is  the  mirror  image  of  the 
other) .  The  appendix  shows  that  these  two  noise  bands  have  the  same 
envelopes. 

3.  ES 

For  the  ES  bands,  the  amplitudes  of  the  tones  that  composed 
the  two  bands  of  noise  were  the  *ame,  but  the  phases  were  chosen 
independently : 

m 

w1(t)  =  2  a^ •  cos  ( 2  *jr *  ( f L+<S '  i)  *  t+<p^)  (3a) 

i--m 

m 

w2(t)  =  2  ai'cos(2’ff* (fL+Af+5*i) 't+6^) ,  (3b) 

i=-m 

where  <p ^  and  9^  were  chosen  independently  from  values  uniformly 
distributed  from  zero  to  2 ir .  The  effect  of  phase  randomization  is 
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to  generate  noise  bands  with  similar  power  spectra,  but  different 
envelopes . 

4.  ES 


The  ES  noise  bands  were  generated  in  a  manner  similar  to  Eqns. 
(1) ,  (2),  and  (3),  except  that  the  amplitudes  and  phases  were 

chosen  independently  for  the  two  noise  bands.  The  construction  will 
be  detailed  in  the  procedure  section.  An  example  of  an  ES  waveform 
is  shown  in  the  right  panel  of  Fig.  1. 

II.  Procedure 

In  a  2IFC  paradigm,  listeners  attempted  to  discriminate 
between  two  signals  that  differed  in  one  or  more  aspects  (i.e,, 
envelopes  and/or  power  spectra) .  For  example,  listeners  might 
indicate  which  of  two  intervals  contained  the  two  bands  of  noise 
whose  envelopes  were  the  same.  The  percentage  of  correct 
discriminations  was  the  dependent  variable. 

The  six  conditions  tested  are  represented  by  the  cells  of 
Table  1.  The  first  column  of  Table  1  presents  those  conditions  in 
which  listeners  discriminated  between  the  ES  bands,  which  shared 
both  envelopes  and  spectra,  and  bands  that  differed  in  one  or  more 
aspects:  the  ES  bands  had  different  spectra,  the  ES  bands  had 
different  envelopes,  and  the  ES  bands  had  different  envelopes  and 
different  spectra.  The  bottom  row  of  Table  1  presents  those 
conditions  in  which  listeners  discriminated  between  independent 
bands  of  noise  (ES)  and  bands  that  shared  at  least  one  aspect; 
envelopes,  spectra  or  both.  For  example,  in  the  [ES,ES]  condition, 
the  ES  bands  were  presented  in  one  interval,  and  the  ES  bands  were 
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presented  in  the  other  interval.  Listeners  were  asked  to  indicate 
which  signal  was  composed  of  bands  that  had  identical  envelopes 
(ES)  . 


The  cells  of  Table  1  present  the  center  frequencies  (fL,fL+Af) 
that  were  used.  All  comparisons  included  bands  of  noise  centered  at 
2500  and  2750  Hz,  which  were  chosen  because  subjects  were  able  to 
discriminate  between  the  ES  and  ES  waveforms  on  an  average  of  98 
percent  of  the  trials.  It  was  reasoned  that  the  use  of  the  2500  and 
2750  Hz  center  frequencies  would  allow  ample  opportunity  for  lower 
scores  to  be  observed.  As  presented  in  Table  1,  four  pairs  of 
center  frequencies  were  examined  in  the  [ES,ES]  and  [ES,ES]  test 
conditions . 

The  different  conditions  were  completed  in  random  order, 

i  „ 

except  that  all  listeners  completed  the  [ES,ES]  conditions  first. 

For  each  of  the  six  experimental  conditions  represented  ir.  Table  1, 

I 

the  row  and  column  stimuli  had  equal  a-priori  probability  of  being 

l 

in  the  first  interval. 

The  stimuli  were  generated  on  an  IBM-PC  microcomputer. 
Depending  on  the  stimulus  condition,  one  of  two  presentation 

I 

algorithms  was  followed.  For  the  conditions  presented  along  the 

> 

i 

bottom  row  of  Table  1,  thirty-two  low-  and  high-frequency  noise 

I 

band  pairs  were  generated  and  stored.  The  column  stimulus  was  the 

i  _ 

sura  of  the  'paired*  low-  and  high-frequency  bands,  and  the  ES 
stimulus  was  the  sura  of  independently  chosen  low-  and  high- 
frequency  noise  bands.  For  the  remaining  conditions  ([ES,ES], 
[ES,ES],  and  [ES,ES]),  thirty-two  low-  and  high-frequency  pairs 
were  generated  and  stored,  sixteen  pairs  for  the  row  stimulus,  and 
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sixteen  pairs  for  the  column  stimulus.  The  stimuli  were  chosen 
randomly  from  among  each  of  the  sixteen  pairs. 

All  waveforms  were  played  through  a  12-bit  D/A  at  a  sampling 
rate  of  14.3  kHz  and  low-pass  filtered  (Kemo  VBF/23)  at  6  kHz.  The 
signal  duration  was  100  ms,  plus  5-ms  cosine-squared  onset/offset 
ramps.  The  two  listening  intervals  were  separated  by  300  ms,  and 
both  intervals  were  visually  indicated  on  a  display  screen. 
Following  the  listener's  response,  visual  feedback  was  displayed 
for  240  ms. 

The  stimuli  were  presented  diotically  via  Sennheiser  HD  414  SL 
earphones  at  an  average  total  level  of  75  dB  SPL.  Subjects  listened 
in  a  sound-treated  room.  Conditions  were  tested  in  50-trial  blocks, 
six  blocks  at  a  time.  Thus,  each  data  point  is  based  on  300  trials. 
Listeners  typically  completed  18  to  24  blocks  a  day.  Each  condition 
was  completed  before  practice  for  the  next  condition  was  begun. 

Listeners  were  undergraduate  students  paid  to  participate, 
except  GR,  who  is  the  author.  All  had  previously  participated  in 
similar  experiments,  and  little  practice  was  completed  prior  to 
data  collection.  When  a  novel  condition  was  introduced,  listeners 
typically  required  only  50  to  100  trials  in  order  to  'learn'  the 
task.  In  order  to  assess  possible  effects  of  practice, 
approximately  1/3  of  the  conditions  were  repeated.  Of  those 
repeated,  about  10%  led  to  significantly  different  averages.  For 
the  repeated  conditions,  the  average  of  only  the  last  300  trials 
are  reported. 
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III.  Results 

Table  2  lists  the  average  percent  correct  discriminations  for 
each  of  the  conditions  tested.  The  averages  are  based  on  four 
subjects.  The  standard  errors  of  the  mean,  based  on  averages  of  the 
four  listeners  and  six  estimates  per  listener,  are  indicated  in 
parentheses.  The  presentation  parallels  the  conditions  of  Table  1, 
the  scores  referring  to  the  center  frequencies  indicated  in  Table 
1. 


-  Table  2  - 

Figure  2  presents  the  data  for  two  of  the  experimental 
conditions,  [ES,ES]  (open  bars)  and  [ES,ES]  (filled  bars).  For 
each  subject,  a  histogram  plots  the  percent  correct  for  each 
(fL,fL+Af)  center  frequency  pair  tested.  The  error  bars  indicate 
the  standard  errors  of  the  mean. 

-  Figure  2  - 

First  consider  the  data  obtained  in  the  [ES,ES]  condition 
(open  bars) .  For  all  four  subjects,  increasing  the  center  frequency 
of  the  higher  noise  band  from  2750  to  3500  Hz  (holding  the  lower 
frequency  noise  band  fixed  at  2500  Hz)  yielded  poorer  performance. 
The  extent  of  the  drop  in  performance  was  subject  dependent,  with 
subjects  VF  and  KK  changing  relatively  little,  and  subjects  GR  and 
MS  changing  relatively  more.  In  the  (4000,4400)  Hz  condition, 


discrimination  was  nearly  perfect.  These  data  constitute  a 
replication  of  the  experiment  presented  by  Richards  (1987),  with 
the  same  results. 

Next  consider  the  data  in  the  [ES,ES]  condition  (filled  bars), 
which  is  similar  to  the  [ES,ES]  condition,  except  that  the  two 
bands  of  noise  always  had  different  power  spectra.  Clearly, 
envelope  similarity  is  detectable  in  the  absence  of  co-varying 
spectral  cues.  For  three  of  the  four  subjects  (KK,  GR,  and  MS) , 
increasing  the  frequency  separation  of  the  two  noise  bands  led  to 
decreased  discriminabil ity .  Reading  from  Fig.  2,  for  fifteen  of  the 
sixteen  possible  comparisons,  performance  levels  in  the  [ES,ES] 
condition  were  superior  to  performance  levels  obtained  in  the 
[E^,US]  condition.  On  average,  the  difference  was  thirteen 
percentage  points,  but  the  effect  was  subject  dependent.  Subjects 
VF  and  KK  were  little  affected  (an  average  difference  of  5.2 
percent  correct) ,  while  subjects  GR  and  MS  showed  larger  changes 
(an  average  difference  of  20.5  percent  correct). 

The  data  presented  in  Figure  2  indicate  that  listeners  were 
able  to  discriminate  between  bands  of  noice  with  either  identical 
or  statistically  independent  envelopes,  even  when  spectral  cues 
were  absent.  For  two  of  the  subjects,  however,  performance  levels 
were  superior  when  both  spectral  and  envelope  cues  were  available. 
The  relatively  little  change  observed  for  the  other  two  subjects 
may  reflect  a  performance  ceiling  in  the  [ES,ES]  condition. 

Figure  3  presents  the  data  for  the  [ES,ES],  [ES,ES],  [ES,ES], 
and  the  [ES,ES]  conditions.  The  bands  of  noise  were  centered  at 
2500  and  2750  Hz.  In  the  [ES,ES]  condition,  none  of  the  four 
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listeners  was  able  to  reliably  perform  above  chance  levels  (50% 
correct) ,  even  though  three  of  the  four  subjects  completed  in 
excess  of  1000  trials.  That  is,  when  the  envelopes  were  always 
different,  changes  in  power  spectra  were  not  detectable.  Most  of 
the  efforts  were  limited  to  center  frequencies  of  2500  and  2750  Hz, 
but  others  were  occasionally  tested.  In  no  instance  did  performance 
appear  to  rise  above  chance.  These  data  indicate  that  simultaneous 
comparisons  of  either  spectral  shape  or  energy  levels  is  not  an 
important  aspect  cf  what  has  been  termed  envelope  correlation 
detection  (Richards,  1987)  . 

Figure  3  shows  that  subjects  were  able  to  discriminate  between 
the  ES  and  ES  waveforms  (an  average  of  85  percent  correct 
discriminations) .  That  is,  when  both  intervals  contained  noise 
bands  with  identical  envelopes,  changes  in  power  spectra  were 
detectable.  This  contrasts  with  the  finding  that  the  subjects  were 
unable  to  discriminate  between  similar  and  dissimilar  power  spectra 
when  the  noise  bands  had  dissimilar  envelopes  (the  [ES,ES] 
condition  described  above) . 

-  Figure  3  - 

In  the  [ ES , ES ]  condition,  the  listeners  discriminated  between 
bands  of  noise  whose  envelopes  were  either  identical  or  different, 
but  the  noise  bands  always  had  the  same  power  spectra.  The  fact 
that  discriminations  were  good  (an  average  of  97  percent  correct) 
supports  the  notion  that  similar  and  dissimilar  envelopes  are 
discriminable  even  when  spectral  and  envelope  cues  do  not  co-vary. 
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Finally,  subjects  were  able  to  discriminate,  with  an  average 
accuracy  of  84  percent  correct,  between  the  ES  waveforms,  which  had 
identical  envelopes  and  different  power  spectra,  and  the  ES 
waveforms,  which  had  similar  spectra  but  different  envelopes.  The 
interpretation  of  this  finding  is  not  clear,  as  the  role  of 
spectral  cues  is  unclear. 

In  summary,  listeners  were  always  able  to  discriminate  between 
waveforms  composed  of  bands  of  noise  that  had  identical  envelopes 
and  waveforms  composed  of  bands  of  noise  that  had  independent 
envelopes.  When  the  bands  of  noise  had  power  spectra  that  were 
either  the  same  or  different,  listeners  were  unable  to  discriminate 
the  change  unless  the  bands  of  noise  had  identical  envelopes. 
Finally,  subjects  performed  well  in  the  [ES,ES]  condition,  in  which 
both  envelopes  and  power  spectra  changed.  In  that  condition,  it  is 
not  clear  whether  the  discrimination  was  based  on  changes  in 
envelopes,  changes  in  power  spectra,  or  both.  It  seems  reasonable 
to  assume  that  the  discriminations  were  based  on  changes  in 
envelope  similarity,  since  changes  in  spectral  similarity  were  not 
detectable  unless  the  noise  bands  had  the  same  envelopes. 

IV.  Discussion 

The  data  presented  above  introduce  two  discrepancies,  both 
involving  the  discriminability  of  changes  in  the  power  spectrum  of 
the  narrow  bands  of  noise.  The  first  involves  the  data  of  Figure  2: 
If  removing  power  spectrum  cues  led  to  a  reduction  in  the 
discriminability  of  envelope  similarity  ([ES,ES]  vs .  [ES,ES],  an 
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average  difference  of  13  percent) ,  why  were  subjects  unable  to  to 
detect  (an  average  of  50  %  correct)  the  change  in  power  spectra 
between  the  ES  and  ES  waveforms?  Second,  given  that  subjects 
discriminated  between  the  ES  and  ES  waveforms  on  an  average  of  85 
percent  of  the  trials,  why  were  listeners  unable  to  exceed  chance 
performance  in  the  ES  and  ES  condition? 

Before  examining  these  inconsistencies  at  length,  the  reader 
should  be  made  aware  of  potential  problems  in  the  experimental 
procedure.  First,  as  has  been  observed  in  other  CMR  experiments 
(McFadden,  1986;  Buus,  1985),  there  are  considerable  between- 
subject  differences.  Indeed,  for  the  current  experiment,  a 
consistent  ranking  of  the  dependent  variable  as  a  function  of  the 
condition  across  the  four  listeners  cannot  be  achieved. 

Second,  due  to  the  large  number  of  conditions  tested,  there  is 
no  guarantee  that  the  subjects  were  comfortable  with  the 
discrimination  response  required  for  each  of  the  conditions  tested. 
Although  listeners  gave  no  indication  of  confusion,  it  is  possible 
that  other  experimental  procedures  (for  example,  a  same-different 
procedure)  or  further  practice  might  have  altered  the  magnitude  of 
the  performance  levels.  Certainly  longer  signal  durations  would 
have  lead  to  superior  performance  levels  (Richards,  1988)  ,  but 
whether  the  relative  performance  levels  would  have  been  altered 
remains  to  be  determined.  For  these  reasons,  moderate  changes  in 
performance  level  deserve  little  emphasis.  While  such  limitations 
argue  for  further  experimentation,  we  do  not  feel  that  they  are  so 
crippling  as  to  affect  the  basic  observations  noted  above.  The 
difference  between  discriminability  ([ES,ES])  and 
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indiscriminability  ([ES,E5])  is  clear,  and  so  will  be  the  primary 
comparison  of  interest.  The  difference  in  performance  levels 
between  the  [ES,ES]  and  the  [ES,ES]  conditions  will  receive  little 
attention  as  the  difference  was  subject  dependent. 

In  order  to  address  the  change  in  performance  levels  between 
the  (ES,ES)  and  the  [ES,  ES]  conditions,  we  shall  examine  an 
assumption  that  the  two  dimensions,  envelope  and  spectral,  provide 
a  sufficient  basis  from  which  to  describe  the  discriminations.  The 
possibility  that  these  dimensions  are  not  sufficient  is  explored 
below,  but  no  satisfactory  explanation  of  the  data  is  presented. 

Are  envelope  and  spectral  considerations  sufficient?  To  this 
point  we  have  ignored  changes  in  the  'fine  structure'  of  the 
signals.  Consider  the  bands  of  noise  that  made  up  the  ES  waveforms. 
The  bands  had  the  same  envelopes  and  the  same  power  spectrum  (save 
for  a  shift  in  center  frequency) .  Further,  the  phase  function  (the 
phase  of  the  fine  structure  as  a  function  of  time;  Davenport  and 
Root,  1958)  was  the  same.  In  contrast,  the  bands  that  made  up  the 
ES  waveforms  had  identical  envelopes,  but  the  power  spectrum  and 
the  phase  functions  were  not  the  same.  Might  dynamic  changes  in  the 
phase  functions  affect  the  discriminations? 

A  direct  comparison  of  the  phase  functions  seems  an  unlikely 
cue;  the  fine  structure  is  not  'extractable'  at  the  high 
frequencies  used  here.  However,  there  is  the  possibility  that 
changes  in  fine  structure  might  be  manifested  in  other  ways. 

Initially  we  considered  McFadden's  (1987;  see  also  McFadden, 
1975)  observation  that  for  CMR  experiments,  and  by  extension  the 
current  experiment,  it  is  often  unreasonable  to  assume  that 
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envelopes  extracted  from  the  low-  and  high-frequency  regions  bands 
match.  The  higher-frequency  band  is  probably  better  described  as  a 
sum  of  the  high-  and  low-frequency  bands.  This  point  is  especially 
germane  here  since  the  experimental  bands  were  relatively  close, 
with  the  center  frequencies  being  separated  by  no  more  than  a 
critical  band.  For  these  reasons  we  considered  the  envelopes  of  the 
summed  ES ,  ES,  ES  and  ES  waveforms. 

The  bands  of  noise  that  comprise  the  ES ,  ES,  ES  and  ES 
waveforms  were  generated,  the  paired  bands  added,  and  the  envelopes 
of  the  resulting  waveforms  extracted.  The  root-mean-squared  (RMS) 
values  for  the  extracted  envelopes  were  then  computed.  The  RMS 
values  of  the  ES  and  ES  waveforms  did  not  differ  significantly  (for 
similar  parameter  conditions,  the  envelopes  of  the  summed  waveforms 
were  indistinguishable) .  Nor  were  the  RMS  values  for  the  Es  and  ES 
waveforms  significantly  different.  The  RMS  values  for  the  ES  and  ES 
waveforms  were,  however,  significantly  larger  than  the  RMS  values 
for  the  Is  and  ES  waveforms.  Thus,  the  change  in  the  RMS  values  of 
the  envelopes  of  the  summed  waveform  cannot  account  for  the 
difference  in  performance  levels  for  the  [ES,ES]  and  the  [Es,ES] 
conditions.  The  simulations  do,  however,  indicate  that  the  peak-to- 
valley-ratio,  or  the  'modulation  depth',  of  the  envelopes  of  the 
summed  waveforms  may  contribute  to  the  discrimination  between 
simultaneously  presented  bands  of  noise  that  have  either  identical 
(ES  and  ES)  or  statistically  independent  (ES  and  ES)  envelopes. 

It  is  evident  that  changes  in  the  phase  functions  do  not  alter 
the  summed  bands  in  a  manner  consistent  with  our  results.  We  are 
currently  investigating  the  extent  to  which  other  peripheral 
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interactions  may  contribute  to  the  discrimination  between  waveforms 
composed  of  correlated  and  independent  bands  of  noise. 

V.  Summary 

(1)  The  ability  to  discriminate  between  bands  of  noise  whose 
envelopes  were  either  the  same  or  statistically  independent  was 
accomplished  in  the  absence  of  spectral  cues.  If  spectral  cues  co¬ 
varied  with  envelope  cues,  performance  levels  were  typically 
superior  to  those  obtained  in  the  absence  of  co-varying  cues. 

(2)  When  the  bands  of  noise  had  envelopes  that  were  not  the 
same,  listeners  were  unable  to  discriminate  between  noise  bands 
whose  power  spectra  were  either  the  same  of  different.  When  the 
noise  bands  had  identical  envelopes,  listeners  were  able  to 
indicate  whether  the  bands  of  noise  had  the  same  power  spectra. 
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Appendix 

Here  is  shown  that  the  waveforms  described  by  Eqns .  (2a)  and 

(2b)  have  identical  envelopes.  To  this  end,  the  envelopes  of  each 
of  the  two  waveforms  will  be  derived. 

The  envelope  of  a  narrow  band  of  noise,  A(t),  may  be  written 
in  terms  of  the  original  waveform,  w(t),  and  its  Hilbert  Transform, 
w(t)  : 

A2 ( t)  -  w2 ( t )  +  w2 ( t ) .  (Al) 

Because  the  Hilbert  transform  is  linear,  the  Hilbert  Transform  of  a 
sum  of  cosines  is  the  sum  of  the  Hilbert  Transforms  of  the  cosines. 
Further,  because  the  Hilbert  Transform  of  a  cosine  is  a  sine,  the 
Hilbert  Transform  of  the  noise  bands  described  by  Eqn.  (2a) : 

ra 

w-^ft)  =■  £  a^  *  cos  (2  •  jt  *  (fL+iS  *  i)  *  t+0^)  (2a) 

i=-m 


is 

m 

Wi(t)  =  £  a^*  sin( 2  ’  jt *  ( fL+6  '  i)  '  t+<#>^)  . 

i=-m 


Squaring  and  adding: 

A12(t)  =  w12(t)+w12(t)  =  (  £  ai-cos(2-w (fL+tf'i) -t+^i)  )2 

i=-m 


+  (  £  ai*sin(2’jr’ (fL+5*i)  *t+0i)  )2. 
i=-m 


Expanding  the  right  hand  side 
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A^2(t)  =  E  a^2  *  cos2  (2  ‘  jt  *  (fL+(S  •  i)  *  t+0^) 
i=-m 

(m-3) 

+  £  £  *  a.i  ’  cos  (2  ‘  jt  '  (fL+<S'i)  "t+<p±)  ,cos(2*jt’  (fL+<J- j)  ‘t+0j) 

i=-m  j>i  J  J 


+  £  a^'sin2  ( 2 * 7T *  (fL+6*i)  ’tH-^) 

i=-m 

(n-1) 

+  £  E  a^'a.!  *  sin  (2  *  tt  '  ( f  £+<S  •  i)  *  t+0^)  *  sin  (2  '  tt  "  ( f  L+i  *  j )  '  t+0^ )  . 

i=-m  j>i  J  J 


Because  cos2a+sin2a  =  1,  and  because  cosa‘cos/3  +  sina*sin/J  =  cos(a- 
/3)  ,  A12(t)  may  be  written  as 

o  m  o 

Ax2(t)  =  E  ai2 

i=-m 


(m-1) 

+  E  E  a< ' *  [cos (2*jr*£(i-j)  *t+  (<pi-<pi ) )  ] .  (A2) 

i=-m  j>i  3  J 


Next,  we  shall  show  that  the  complementary  waveform,  w2(t) 
(Eqn.  (2b)),  has  the  same  envelope.  Although  the  noise  band's 
center  frequency  has  no  bearing  on  the  shape  of  the  envelope 
(provided  the  center  frequency  is  large  relative  to  the  bandwidth) , 
the  difference  in  center  frequency  will  be  maintained  in  order  to 
demonstrate  that  point. 

m 

w2(t)  «  E  a_i*cos(2,7r*  (fj+Af+S- i)  *t-0_i)  .  (2b) 

i=-m 


Equation  (2b)  may  be  rewritten,  replacing  i  with  -i,  and  changing 
the  order  of  addition, 
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m 

W2(t)  =  E  a^cos^-m*  (fL+Af-(S'i)  't-<p^)  .  (A3) 

i=-m 


After  the  change  of  variables,  w2(t)  resembles  w1(t),  except  that 
the  center  frequency  has  been  increased,  the  phase  constant  is 
subtracted  rather  than  added,  and  the  frequencies  are  decremented 
from  high  to  low. 

The  Hilbert  Transform  of  w2(t)  is  given  by 
m 

w2(t)  =  E  a£*sin(2  'it'  (fL+Af-<5  •  i)  ‘t-c*^)  . 
i=-m 


Squaring  and  adding  the  waveform  and  its  transform  yields 
«  «  A  -  ^ 

A22(t)  =  w22(t)+w22(t)  =  (  S  ai-cos(2*7r*  (fL+Af-5*  i)  )* 

i=-m 


+  (  E  a  ^  ‘  sin  (2  '  m  *  (fL+Af-5*i)  )2. 

i=-m 

As  before,  we  expand  the  right  hand  side  and  combine  like  terms  to 
obtain 

?  m  o 

A22(t)  =  E  ai2 

i=-m 

(m-1) 

+  E  E  ai'aj  •  [cos(2*tt-5- (j-i)  •t+(</.j-0i) )  ] . 
i=-m  j>i 


cosine  is  an  even  function,  this  may  be  rewritten  as 

A22(t)  =  S  ajL2 
i=-m 

(m-1) 

+  E  E  aj_*  aj  *  [cos  (2  *»r  *  t  •  6  (i-j  )  +  (0j_-0j  )  )  ]  .  (A4) 

i=-m  j>i 


Since  the 
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Thus,  the  waveforms  represented  by  Eqns.  (2a)  and  (2b)  have 
identical  envelopes  (  (A2)=(A4)  )  . 
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Table  Captions 

Table  1.  The  experimental  conditions  are  presented.  Each  cell 
indicates  discrimination  between  the  row  and  column  stimuli.  The 
center  frequencies  that  were  tested  are  indicated  in  each  cell. 

Table  2.  The  average  percent  correct  is  shown  for  each  of  the 
experimental  conditions  presented  in  Table  1.  The  standard  errors 
of  the  mean,  based  on  six  estimates  for  all  of  the  four  listeners, 
are  indicated  in  parentheses. 
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Figure  Captions 

Figure  1.  Two  stimuli  are  shown.  The  top  panel  presents  the 
stimulus  whose  envelopes  and  power  spectra  are  the  same  (ES) .  The 
bottom  panel  shows  two  statistically  independent  noise  bands  (ES) . 
The  summed  waveform,  the  amplitude  spectra,  and  the  waveforms  of 
the  individual  noise  bands  are  drawn. 

Figure  2.  Percent  correct  discriminations  are  indicated  for 
two  experimental  conditions,  [ES,ES]  (open  bars)  and  [ES,ES] 
(filled  bars),  and  for  each  of  the  (fL,fL+Af)  center  frequencies 
tested.  The  data  for  each  listener  are  plotted  separately.  Error 
bars  indicate  the  standard  errors  of  the  mean. 

Figure  3 .  Percent  correct  discriminations  are  indicated  for 
the  [ES,ES],  [ES , ES] ,  [ES,ES],  and  [ES,ES]  conditions.  The  noise 
bands  were  centered  at  2500  and  2750  Hz. 
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Table  2 


Average  Percent  Correct 
For  the  test  conditions  shown  in  Table  1 


PERCENT  CORRECT 


PERCENT  CORRECT 


